868 resultados para Debugging in computer science.


Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper gives an overview of the INEX 2008 Ad Hoc Track. The main goals of the Ad Hoc Track were two-fold. The first goal was to investigate the value of the internal document structure (as provided by the XML mark-up) for retrieving relevant information. This is a continuation of INEX 2007 and, for this reason, the retrieval results are liberalized to arbitrary passages and measures were chosen to fairly compare systems retrieving elements, ranges of elements, and arbitrary passages. The second goal was to compare focused retrieval to article retrieval more directly than in earlier years. For this reason, standard document retrieval rankings have been derived from all runs, and evaluated with standard measures. In addition, a set of queries targeting Wikipedia have been derived from a proxy log, and the runs are also evaluated against the clicked Wikipedia pages. The INEX 2008 Ad Hoc Track featured three tasks: For the Focused Task a ranked-list of nonoverlapping results (elements or passages) was needed. For the Relevant in Context Task non-overlapping results (elements or passages) were returned grouped by the article from which they came. For the Best in Context Task a single starting point (element start tag or passage start) for each article was needed. We discuss the results for the three tasks, and examine the relative effectiveness of element and passage retrieval. This is examined in the context of content only (CO, or Keyword) search as well as content and structure (CAS, or structured) search. Finally, we look at the ability of focused retrieval techniques to rank articles, using standard document retrieval techniques, both against the judged topics as well as against queries and clicks from a proxy log.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Searching for relevant peer-reviewed material is an integral part of corporate and academic researchers. Researchers collect huge amount of information over the years and sometimes struggle organizing it. Based on a study with 30 academic researchers, we explore, in combination, different searching and archiving activities of document-based information. Based on our results we provide several implications for design.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This presentation will explore how BPM research can seamlessly combine the academic requirement of rigor with the aim to impact the practice of Business Process Management. After a brief introduction into the research agendas as they are perceived by different BPM communities, two research projects will be discussed that illustrate how empirically-informed quantitative and qualitative research, combined with design science, can lead to outcomes that BPM practitioners are willing to adopt. The first project studies the practice of process modeling using Information Systems theory, and demonstrates how a better understanding of this practice can inform the design of modeling notations and methods. The second project studies the adoption of process management within organizations, and leads to models of how organizations can incrementally transition to greater levels of BPM maturity. The presentation will conclude with recommendations for how the BPM research and practitioner communities can increasingly benefit from each other.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

With the growing size and variety of social media files on the web, it’s becoming critical to efficiently organize them into clusters for further processing. This paper presents a novel scalable constrained document clustering method that harnesses the power of search engines capable of dealing with large text data. Instead of calculating distance between the documents and all of the clusters’ centroids, a neighborhood of best cluster candidates is chosen using a document ranking scheme. To make the method faster and less memory dependable, the in-memory and in-database processing are combined in a semi-incremental manner. This method has been extensively tested in the social event detection application. Empirical analysis shows that the proposed method is efficient both in computation and memory usage while producing notable accuracy.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We introduce Claude Lévi Strauss' canonical formula (CF), an attempt to rigorously formalise the general narrative structure of myth. This formula utilises the Klein group as its basis, but a recent work draws attention to its natural quaternion form, which opens up the possibility that it may require a quantum inspired interpretation. We present the CF in a form that can be understood by a non-anthropological audience, using the formalisation of a key myth (that of Adonis) to draw attention to its mathematical structure. The future potential formalisation of mythological structure within a quantum inspired framework is proposed and discussed, with a probabilistic interpretation further generalising the formula

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The top-k retrieval problem aims to find the optimal set of k documents from a number of relevant documents given the user’s query. The key issue is to balance the relevance and diversity of the top-k search results. In this paper, we address this problem using Facility Location Analysis taken from Operations Research, where the locations of facilities are optimally chosen according to some criteria. We show how this analysis technique is a generalization of state-of-the-art retrieval models for diversification (such as the Modern Portfolio Theory for Information Retrieval), which treat the top-k search results like “obnoxious facilities” that should be dispersed as far as possible from each other. However, Facility Location Analysis suggests that the top-k search results could be treated like “desirable facilities” to be placed as close as possible to their customers. This leads to a new top-k retrieval model where the best representatives of the relevant documents are selected. In a series of experiments conducted on two TREC diversity collections, we show that significant improvements can be made over the current state-of-the-art through this alternative treatment of the top-k retrieval problem.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Novelty-biased cumulative gain (α-NDCG) has become the de facto measure within the information retrieval (IR) community for evaluating retrieval systems in the context of sub-topic retrieval. Setting the incorrect value of parameter α in α-NDCG prevents the measure from behaving as desired in particular circumstances. In fact, when α is set according to common practice (i.e. α = 0.5), the measure favours systems that promote redundant relevant sub-topics rather than provide novel relevant ones. Recognising this characteristic of the measure is important because it affects the comparison and the ranking of retrieval systems. We propose an approach to overcome this problem by defining a safe threshold for the value of α on a query basis. Moreover, we study its impact on system rankings through a comprehensive simulation.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The assumptions underlying the Probability Ranking Principle (PRP) have led to a number of alternative approaches that cater or compensate for the PRP’s limitations. All alternatives deviate from the PRP by incorporating dependencies. This results in a re-ranking that promotes or demotes documents depending upon their relationship with the documents that have been already ranked. In this paper, we compare and contrast the behaviour of state-of-the-art ranking strategies and principles. To do so, we tease out analytical relationships between the ranking approaches and we investigate the document kinematics to visualise the effects of the different approaches on document ranking.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper, we consider the problem of document ranking in a non-traditional retrieval task, called subtopic retrieval. This task involves promoting relevant documents that cover many subtopics of a query at early ranks, providing thus diversity within the ranking. In the past years, several approaches have been proposed to diversify retrieval results. These approaches can be classified into two main paradigms, depending upon how the ranks of documents are revised for promoting diversity. In the first approach subtopic diversification is achieved implicitly, by choosing documents that are different from each other, while in the second approach this is done explicitly, by estimating the subtopics covered by documents. Within this context, we compare methods belonging to the two paradigms. Furthermore, we investigate possible strategies for integrating the two paradigms with the aim of formulating a new ranking method for subtopic retrieval. We conduct a number of experiments to empirically validate and contrast the state-of-the-art approaches as well as instantiations of our integration approach. The results show that the integration approach outperforms state-of-the-art strategies with respect to a number of measures.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Existing techniques for automated discovery of process models from event logs largely focus on extracting flat process models. In other words, they fail to exploit the notion of subprocess, as well as structured error handling and repetition constructs provided by contemporary process modeling notations, such as the Business Process Model and Notation (BPMN). This paper presents a technique for automated discovery of BPMN models containing subprocesses, interrupting and non-interrupting boundary events, and loop and multi-instance markers. The technique analyzes dependencies between data attributes associated with events, in order to identify subprocesses and to extract their associated logs. Parent process and subprocess models are then discovered separately using existing techniques for flat process model discovery. Finally, the resulting models and logs are heuristically analyzed in order to identify boundary events and markers. A validation with one synthetic and two real-life logs shows that process models derived using the proposed technique are more accurate and less complex than those derived with flat process model discovery techniques.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

While the Probability Ranking Principle for Information Retrieval provides the basis for formal models, it makes a very strong assumption regarding the dependence between documents. However, it has been observed that in real situations this assumption does not always hold. In this paper we propose a reformulation of the Probability Ranking Principle based on quantum theory. Quantum probability theory naturally includes interference effects between events. We posit that this interference captures the dependency between the judgement of document relevance. The outcome is a more sophisticated principle, the Quantum Probability Ranking Principle, that provides a more sensitive ranking which caters for interference/dependence between documents’ relevance.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Semantic Space models, which provide a numerical representation of words’ meaning extracted from corpus of documents, have been formalized in terms of Hermitian operators over real valued Hilbert spaces by Bruza et al. [1]. The collapse of a word into a particular meaning has been investigated applying the notion of quantum collapse of superpositional states [2]. While the semantic association between words in a Semantic Space can be computed by means of the Minkowski distance [3] or the cosine of the angle between the vector representation of each pair of words, a new procedure is needed in order to establish relations between two or more Semantic Spaces. We address the question: how can the distance between different Semantic Spaces be computed? By representing each Semantic Space as a subspace of a more general Hilbert space, the relationship between Semantic Spaces can be computed by means of the subspace distance. Such distance needs to take into account the difference in the dimensions between subspaces. The availability of a distance for comparing different Semantic Subspaces would enable to achieve a deeper understanding about the geometry of Semantic Spaces which would possibly translate into better effectiveness in Information Retrieval tasks.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Researchers are increasingly grappling with ways of theorizing social media and its use. This review essay proposes that the theory of Information Grounds (IG) may provide a valuable lens for understanding how social media fosters collaboration and social engagement among information professionals. The paper presents literature that helps us understand how social media can be seen as IG, and maps the characteristics of social media to the seven propositions of IG theory. This work is part of a wider study investigating the ways in which Information Technology (IT) professionals experience social media.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Digital signatures are often used by trusted authorities to make unique bindings between a subject and a digital object; for example, certificate authorities certify a public key belongs to a domain name, and time-stamping authorities certify that a certain piece of information existed at a certain time. Traditional digital signature schemes however impose no uniqueness conditions, so a trusted authority could make multiple certifications for the same subject but different objects, be it intentionally, by accident, or following a (legal or illegal) coercion. We propose the notion of a double-authentication-preventing signature, in which a value to be signed is split into two parts: a subject and a message. If a signer ever signs two different messages for the same subject, enough information is revealed to allow anyone to compute valid signatures on behalf of the signer. This double-signature forgeability property discourages signers from misbehaving---a form of self-enforcement---and would give binding authorities like CAs some cryptographic arguments to resist legal coercion. We give a generic construction using a new type of trapdoor functions with extractability properties, which we show can be instantiated using the group of sign-agnostic quadratic residues modulo a Blum integer.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Recurrence relations in mathematics form a very powerful and compact way of looking at a wide range of relationships. Traditionally, the concept of recurrence has often been a difficult one for the secondary teacher to convey to students. Closely related to the powerful proof technique of mathematical induction, recurrences are able to capture many relationships in formulas much simpler than so-called direct or closed formulas. In computer science, recursive coding often has a similar compactness property, and, perhaps not surprisingly, suffers from similar problems in the classroom as recurrences: the students often find both the basic concepts and practicalities elusive. Using models designed to illuminate the relevant principles for the students, we offer a range of examples which use the modern spreadsheet environment to powerfully illustrate the great expressive and computational power of recurrences.