362 resultados para natural language


Relevância:

60.00% 60.00%

Publicador:

Resumo:

Novice programmers have difficulty developing an algorithmic solution while simultaneously obeying the syntactic constraints of the target programming language. To see how students fare in algorithmic problem solving when not burdened by syntax, we conducted an experiment in which a large class of beginning programmers were required to write a solution to a computational problem in structured English, as if instructing a child, without reference to program code at all. The students produced an unexpectedly wide range of correct, and attempted, solutions, some of which had not occurred to their teachers. We also found that many common programming errors were evident in the natural language algorithms, including failure to ensure loop termination, hardwiring of solutions, failure to properly initialise the computation, and use of unnecessary temporary variables, suggesting that these mistakes are caused by inexperience at thinking algorithmically, rather than difficulties in expressing solutions as program code.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Current regulatory requirements on data privacy make it increasingly important for enterprises to be able to verify and audit their compliance with their privacy policies. Traditionally, a privacy policy is written in a natural language. Such policies inherit the potential ambiguity, inconsistency and mis-interpretation of natural text. Hence, formal languages are emerging to allow a precise specification of enforceable privacy policies that can be verified. The EP3P language is one such formal language. An EP3P privacy policy of an enterprise consists of many rules. Given the semantics of the language, there may exist some rules in the ruleset which can never be used, these rules are referred to as redundant rules. Redundancies adversely affect privacy policies in several ways. Firstly, redundant rules reduce the efficiency of operations on privacy policies. Secondly, they may misdirect the policy auditor when determining the outcome of a policy. Therefore, in order to address these deficiencies it is important to identify and resolve redundancies. This thesis introduces the concept of minimal privacy policy - a policy that is free of redundancy. The essential component for maintaining the minimality of privacy policies is to determine the effects of the rules on each other. Hence, redundancy detection and resolution frameworks are proposed. Pair-wise redundancy detection is the central concept in these frameworks and it suggests a pair-wise comparison of the rules in order to detect redundancies. In addition, the thesis introduces a policy management tool that assists policy auditors in performing several operations on an EP3P privacy policy while maintaining its minimality. Formal results comparing alternative notions of redundancy, and how this would affect the tool, are also presented.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Following an early claim by Nelson & McEvoy suggesting that word associations can display `spooky action at a distance behaviour', a serious investigation of the potentially quantum nature of such associations is currently underway. In this paper quantum theory is proposed as a framework suitable for modelling the mental lexicon, specifically the results obtained from both intralist and extralist word association experiments. Some initial models exploring this hypothesis are discussed, and they appear to be capable of substantial agreement with pre-existing experimental data. The paper concludes with a discussion of some experiments that will be performed in order to test these models.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In this paper, we propose an unsupervised segmentation approach, named "n-gram mutual information", or NGMI, which is used to segment Chinese documents into n-character words or phrases, using language statistics drawn from the Chinese Wikipedia corpus. The approach alleviates the tremendous effort that is required in preparing and maintaining the manually segmented Chinese text for training purposes, and manually maintaining ever expanding lexicons. Previously, mutual information was used to achieve automated segmentation into 2-character words. The NGMI approach extends the approach to handle longer n-character words. Experiments with heterogeneous documents from the Chinese Wikipedia collection show good results.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This work presents an extended Joint Factor Analysis model including explicit modelling of unwanted within-session variability. The goals of the proposed extended JFA model are to improve verification performance with short utterances by compensating for the effects of limited or imbalanced phonetic coverage, and to produce a flexible JFA model that is effective over a wide range of utterance lengths without adjusting model parameters such as retraining session subspaces. Experimental results on the 2006 NIST SRE corpus demonstrate the flexibility of the proposed model by providing competitive results over a wide range of utterance lengths without retraining and also yielding modest improvements in a number of conditions over current state-of-the-art.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This paper presents a novel approach of estimating the confidence interval of speaker verification scores. This approach is utilised to minimise the utterance lengths required in order to produce a confident verification decision. The confidence estimation method is also extended to address both the problem of high correlation in consecutive frame scores, and robustness with very limited training samples. The proposed technique achieves a drastic reduction in the typical data requirements for producing confident decisions in an automatic speaker verification system. When evaluated on the NIST 2005 SRE, the early verification decision method demonstrates that an average of 5–10 seconds of speech is sufficient to produce verification rates approaching those achieved previously using an average in excess of 100 seconds of speech.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

We argue that web service discovery technology should help the user navigate a complex problem space by providing suggestions for services which they may not be able to formulate themselves as (s)he lacks the epistemic resources to do so. Free text documents in service environments provide an untapped source of information for augmenting the epistemic state of the user and hence their ability to search effectively for services. A quantitative approach to semantic knowledge representation is adopted in the form of semantic space models computed from these free text documents. Knowledge of the user’s agenda is promoted by associational inferences computed from the semantic space. The inferences are suggestive and aim to promote human abductive reasoning to guide the user from fuzzy search goals into a better understanding of the problem space surrounding the given agenda. Experimental results are discussed based on a complex and realistic planning activity.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

While spoken term detection (STD) systems based on word indices provide good accuracy, there are several practical applications where it is infeasible or too costly to employ an LVCSR engine. An STD system is presented, which is designed to incorporate a fast phonetic decoding front-end and be robust to decoding errors whilst still allowing for rapid search speeds. This goal is achieved through mono-phone open-loop decoding coupled with fast hierarchical phone lattice search. Results demonstrate that an STD system that is designed with the constraint of a fast and simple phonetic decoding front-end requires a compromise to be made between search speed and search accuracy.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The use of the PC and Internet for placing telephone calls will present new opportunities to capture vast amounts of un-transcribed speech for a particular speaker. This paper investigates how to best exploit this data for speaker-dependent speech recognition. Supervised and unsupervised experiments in acoustic model and language model adaptation are presented. Using one hour of automatically transcribed speech per speaker with a word error rate of 36.0%, unsupervised adaptation resulted in an absolute gain of 6.3%, equivalent to 70% of the gain from the supervised case, with additional adaptation data likely to yield further improvements. LM adaptation experiments suggested that although there seems to be a small degree of speaker idiolect, adaptation to the speaker alone, without considering the topic of the conversation, is in itself unlikely to improve transcription accuracy.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Searching for multimedia is an important activity for users of Web search engines. Studying user's interactions with Web search engine multimedia buttons, including image, audio, and video, is important for the development of multimedia Web search systems. This article provides results from a Weblog analysis study of multimedia Web searching by Dogpile users in 2006. The study analyzes the (a) duration, size, and structure of Web search queries and sessions; (b) user demographics; (c) most popular multimedia Web searching terms; and (d) use of advanced Web search techniques including Boolean and natural language. The current study findings are compared with results from previous multimedia Web searching studies. The key findings are: (a) Since 1997, image search consistently is the dominant media type searched followed by audio and video; (b) multimedia search duration is still short (>50% of searching episodes are <1 min), using few search terms; (c) many multimedia searches are for information about people, especially in audio search; and (d) multimedia search has begun to shift from entertainment to other categories such as medical, sports, and technology (based on the most repeated terms). Implications for design of Web multimedia search engines are discussed.