946 resultados para Science and Technology System
Resumo:
An information filtering (IF) system monitors an incoming document stream to find the documents that match the information needs specified by the user profiles. To learn to use the user profiles effectively is one of the most challenging tasks when developing an IF system. With the document selection criteria better defined based on the users’ needs, filtering large streams of information can be more efficient and effective. To learn the user profiles, term-based approaches have been widely used in the IF community because of their simplicity and directness. Term-based approaches are relatively well established. However, these approaches have problems when dealing with polysemy and synonymy, which often lead to an information overload problem. Recently, pattern-based approaches (or Pattern Taxonomy Models (PTM) [160]) have been proposed for IF by the data mining community. These approaches are better at capturing sematic information and have shown encouraging results for improving the effectiveness of the IF system. On the other hand, pattern discovery from large data streams is not computationally efficient. Also, these approaches had to deal with low frequency pattern issues. The measures used by the data mining technique (for example, “support” and “confidences”) to learn the profile have turned out to be not suitable for filtering. They can lead to a mismatch problem. This thesis uses the rough set-based reasoning (term-based) and pattern mining approach as a unified framework for information filtering to overcome the aforementioned problems. This system consists of two stages - topic filtering and pattern mining stages. The topic filtering stage is intended to minimize information overloading by filtering out the most likely irrelevant information based on the user profiles. A novel user-profiles learning method and a theoretical model of the threshold setting have been developed by using rough set decision theory. The second stage (pattern mining) aims at solving the problem of the information mismatch. This stage is precision-oriented. A new document-ranking function has been derived by exploiting the patterns in the pattern taxonomy. The most likely relevant documents were assigned higher scores by the ranking function. Because there is a relatively small amount of documents left after the first stage, the computational cost is markedly reduced; at the same time, pattern discoveries yield more accurate results. The overall performance of the system was improved significantly. The new two-stage information filtering model has been evaluated by extensive experiments. Tests were based on the well-known IR bench-marking processes, using the latest version of the Reuters dataset, namely, the Reuters Corpus Volume 1 (RCV1). The performance of the new two-stage model was compared with both the term-based and data mining-based IF models. The results demonstrate that the proposed information filtering system outperforms significantly the other IF systems, such as the traditional Rocchio IF model, the state-of-the-art term-based models, including the BM25, Support Vector Machines (SVM), and Pattern Taxonomy Model (PTM).
Resumo:
A strong designated verifier signature scheme makes it possible for a signer to convince a designated verifier that she has signed a message in such a way that the designated verifier cannot transfer the signature to a third party, and no third party can even verify the validity of a designated verifier signature. We show that anyone who intercepts one signature can verify subsequent signatures in Zhang-Mao ID-based designated verifier signature scheme and Lal-Verma ID-based designated verifier proxy signature scheme. We propose a new and efficient ID-based designated verifier signature scheme that is strong and unforgeable. As a direct corollary, we also get a new efficient ID-based designated verifier proxy signature scheme.
Resumo:
Network-based Intrusion Detection Systems (NIDSs) analyse network traffic to detect instances of malicious activity. Typically, this is only possible when the network traffic is accessible for analysis. With the growing use of Virtual Private Networks (VPNs) that encrypt network traffic, the NIDS can no longer access this crucial audit data. In this paper, we present an implementation and evaluation of our approach proposed in Goh et al. (2009). It is based on Shamir's secret-sharing scheme and allows a NIDS to function normally in a VPN without any modifications and without compromising the confidentiality afforded by the VPN.
Resumo:
Privacy enhancing protocols (PEPs) are a family of protocols that allow secure exchange and management of sensitive user information. They are important in preserving users’ privacy in today’s open environment. Proof of the correctness of PEPs is necessary before they can be deployed. However, the traditional provable security approach, though well established for verifying cryptographic primitives, is not applicable to PEPs. We apply the formal method of Coloured Petri Nets (CPNs) to construct an executable specification of a representative PEP, namely the Private Information Escrow Bound to Multiple Conditions Protocol (PIEMCP). Formal semantics of the CPN specification allow us to reason about various security properties of PIEMCP using state space analysis techniques. This investigation provides us with preliminary insights for modeling and verification of PEPs in general, demonstrating the benefit of applying the CPN-based formal approach to proving the correctness of PEPs.
Resumo:
The overall research aims to develop a standardised instrument to measure the impacts resulting from contemporary Information Systems (IS). The research adopts the IS-Impact measurement model, introduced by Gable et al, (2008), as its theoretical foundation, and applies the extension strategy described by Berthon et al. (2002); extending both theory and the context, where the new context is the Human Resource (HR) system. The research will be conducted in two phases, the exploratory phase and the specification phase. The purpose of this paper is to present the findings of the exploratory phase. 134 respondents from a major Australian University were involved in this phase. The findings have supported most of the existing IS-Impact model’s credibility. However, some textual data may suggest new measures for the IS-Impact model, while the low response rate or the averting of some may suggest the elimination of some measures from the model.
Resumo:
1. Species' distribution modelling relies on adequate data sets to build reliable statistical models with high predictive ability. However, the money spent collecting empirical data might be better spent on management. A less expensive source of species' distribution information is expert opinion. This study evaluates expert knowledge and its source. In particular, we determine whether models built on expert knowledge apply over multiple regions or only within the region where the knowledge was derived. 2. The case study focuses on the distribution of the brush-tailed rock-wallaby Petrogale penicillata in eastern Australia. We brought together from two biogeographically different regions substantial and well-designed field data and knowledge from nine experts. We used a novel elicitation tool within a geographical information system to systematically collect expert opinions. The tool utilized an indirect approach to elicitation, asking experts simpler questions about observable rather than abstract quantities, with measures in place to identify uncertainty and offer feedback. Bayesian analysis was used to combine field data and expert knowledge in each region to determine: (i) how expert opinion affected models based on field data and (ii) how similar expert-informed models were within regions and across regions. 3. The elicitation tool effectively captured the experts' opinions and their uncertainties. Experts were comfortable with the map-based elicitation approach used, especially with graphical feedback. Experts tended to predict lower values of species occurrence compared with field data. 4. Across experts, consensus on effect sizes occurred for several habitat variables. Expert opinion generally influenced predictions from field data. However, south-east Queensland and north-east New South Wales experts had different opinions on the influence of elevation and geology, with these differences attributable to geological differences between these regions. 5. Synthesis and applications. When formulated as priors in Bayesian analysis, expert opinion is useful for modifying or strengthening patterns exhibited by empirical data sets that are limited in size or scope. Nevertheless, the ability of an expert to extrapolate beyond their region of knowledge may be poor. Hence there is significant merit in obtaining information from local experts when compiling species' distribution models across several regions.
Resumo:
Cultural objects are increasingly generated and stored in digital form, yet effective methods for their indexing and retrieval still remain an important area of research. The main problem arises from the disconnection between the content-based indexing approach used by computer scientists and the description-based approach used by information scientists. There is also a lack of representational schemes that allow the alignment of the semantics and context with keywords and low-level features that can be automatically extracted from the content of these cultural objects. This paper presents an integrated approach to address these problems, taking advantage of both computer science and information science approaches. We firstly discuss the requirements from a number of perspectives: users, content providers, content managers and technical systems. We then present an overview of our system architecture and describe various techniques which underlie the major components of the system. These include: automatic object category detection; user-driven tagging; metadata transform and augmentation, and an expression language for digital cultural objects. In addition, we discuss our experience on testing and evaluating some existing collections, analyse the difficulties encountered and propose ways to address these problems.
Resumo:
Surveillance for invasive non-indigenous species (NIS) is an integral part of a quarantine system. Estimating the efficiency of a surveillance strategy relies on many uncertain parameters estimated by experts, such as the efficiency of its components in face of the specific NIS, the ability of the NIS to inhabit different environments, and so on. Due to the importance of detecting an invasive NIS within a critical period of time, it is crucial that these uncertainties be accounted for in the design of the surveillance system. We formulate a detection model that takes into account, in addition to structured sampling for incursive NIS, incidental detection by untrained workers. We use info-gap theory for satisficing (not minimizing) the probability of detection, while at the same time maximizing the robustness to uncertainty. We demonstrate the trade-off between robustness to uncertainty, and an increase in the required probability of detection. An empirical example based on the detection of Pheidole megacephala on Barrow Island demonstrates the use of info-gap analysis to select a surveillance strategy.
Resumo:
We consider the problem of designing a surveillance system to detect a broad range of invasive species across a heterogeneous sampling frame. We present a model to detect a range of invertebrate invasives whilst addressing the challenges of multiple data sources, stratifying for differential risk, managing labour costs and providing sufficient power of detection.We determine the number of detection devices required and their allocation across the landscape within limiting resource constraints. The resulting plan will lead to reduced financial and ecological costs and an optimal surveillance system.
Resumo:
The automatic extraction of road features from remote sensed images has been a topic of great interest within the photogrammetric and remote sensing communities for over 3 decades. Although various techniques have been reported in the literature, it is still challenging to efficiently extract the road details with the increasing of image resolution as well as the requirement for accurate and up-to-date road data. In this paper, we will focus on the automatic detection of road lane markings, which are crucial for many applications, including lane level navigation and lane departure warning. The approach consists of four steps: i) data preprocessing, ii) image segmentation and road surface detection, iii) road lane marking extraction based on the generated road surface, and iv) testing and system evaluation. The proposed approach utilized the unsupervised ISODATA image segmentation algorithm, which segments the image into vegetation regions, and road surface based only on the Cb component of YCbCr color space. A shadow detection method based on YCbCr color space is also employed to detect and recover the shadows from the road surface casted by the vehicles and trees. Finally, the lane marking features are detected from the road surface using the histogram clustering. The experiments of applying the proposed method to the aerial imagery dataset of Gympie, Queensland demonstrate the efficiency of the approach.
Resumo:
The World Wide Web has become a medium for people to share information. People use Web-based collaborative tools such as question answering (QA) portals, blogs/forums, email and instant messaging to acquire information and to form online-based communities. In an online QA portal, a user asks a question and other users can provide answers based on their knowledge, with the question usually being answered by many users. It can become overwhelming and/or time/resource consuming for a user to read all of the answers provided for a given question. Thus, there exists a need for a mechanism to rank the provided answers so users can focus on only reading good quality answers. The majority of online QA systems use user feedback to rank users’ answers and the user who asked the question can decide on the best answer. Other users who didn’t participate in answering the question can also vote to determine the best answer. However, ranking the best answer via this collaborative method is time consuming and requires an ongoing continuous involvement of users to provide the needed feedback. The objective of this research is to discover a way to recommend the best answer as part of a ranked list of answers for a posted question automatically, without the need for user feedback. The proposed approach combines both a non-content-based reputation method and a content-based method to solve the problem of recommending the best answer to the user who posted the question. The non-content method assigns a score to each user which reflects the users’ reputation level in using the QA portal system. Each user is assigned two types of non-content-based reputations cores: a local reputation score and a global reputation score. The local reputation score plays an important role in deciding the reputation level of a user for the category in which the question is asked. The global reputation score indicates the prestige of a user across all of the categories in the QA system. Due to the possibility of user cheating, such as awarding the best answer to a friend regardless of the answer quality, a content-based method for determining the quality of a given answer is proposed, alongside the non-content-based reputation method. Answers for a question from different users are compared with an ideal (or expert) answer using traditional Information Retrieval and Natural Language Processing techniques. Each answer provided for a question is assigned a content score according to how well it matched the ideal answer. To evaluate the performance of the proposed methods, each recommended best answer is compared with the best answer determined by one of the most popular link analysis methods, Hyperlink-Induced Topic Search (HITS). The proposed methods are able to yield high accuracy, as shown by correlation scores: Kendall correlation and Spearman correlation. The reputation method outperforms the HITS method in terms of recommending the best answer. The inclusion of the reputation score with the content score improves the overall performance, which is measured through the use of Top-n match scores.
Resumo:
Buffer overflow vulnerabilities continue to prevail and the sophistication of attacks targeting these vulnerabilities is continuously increasing. As a successful attack of this type has the potential to completely compromise the integrity of the targeted host, early detection is vital. This thesis examines generic approaches for detecting executable payload attacks, without prior knowledge of the implementation of the attack, in such a way that new and previously unseen attacks are detectable. Executable payloads are analysed in detail for attacks targeting the Linux and Windows operating systems executing on an Intel IA-32 architecture. The execution flow of attack payloads are analysed and a generic model of execution is examined. A novel classification scheme for executable attack payloads is presented which allows for characterisation of executable payloads and facilitates vulnerability and threat assessments, and intrusion detection capability assessments for intrusion detection systems. An intrusion detection capability assessment may be utilised to determine whether or not a deployed system is able to detect a specific attack and to identify requirements for intrusion detection functionality for the development of new detection methods. Two novel detection methods are presented capable of detecting new and previously unseen executable attack payloads. The detection methods are capable of identifying and enumerating the executable payload’s interactions with the operating system on the targeted host at the time of compromise. The detection methods are further validated using real world data including executable payload attacks.
Resumo:
The increasing diversity of the Internet has created a vast number of multilingual resources on the Web. A huge number of these documents are written in various languages other than English. Consequently, the demand for searching in non-English languages is growing exponentially. It is desirable that a search engine can search for information over collections of documents in other languages. This research investigates the techniques for developing high-quality Chinese information retrieval systems. A distinctive feature of Chinese text is that a Chinese document is a sequence of Chinese characters with no space or boundary between Chinese words. This feature makes Chinese information retrieval more difficult since a retrieved document which contains the query term as a sequence of Chinese characters may not be really relevant to the query since the query term (as a sequence Chinese characters) may not be a valid Chinese word in that documents. On the other hand, a document that is actually relevant may not be retrieved because it does not contain the query sequence but contains other relevant words. In this research, we propose two approaches to deal with the problems. In the first approach, we propose a hybrid Chinese information retrieval model by incorporating word-based techniques with the traditional character-based techniques. The aim of this approach is to investigate the influence of Chinese segmentation on the performance of Chinese information retrieval. Two ranking methods are proposed to rank retrieved documents based on the relevancy to the query calculated by combining character-based ranking and word-based ranking. Our experimental results show that Chinese segmentation can improve the performance of Chinese information retrieval, but the improvement is not significant if it incorporates only Chinese segmentation with the traditional character-based approach. In the second approach, we propose a novel query expansion method which applies text mining techniques in order to find the most relevant words to extend the query. Unlike most existing query expansion methods, which generally select the highly frequent indexing terms from the retrieved documents to expand the query. In our approach, we utilize text mining techniques to find patterns from the retrieved documents that highly correlate with the query term and then use the relevant words in the patterns to expand the original query. This research project develops and implements a Chinese information retrieval system for evaluating the proposed approaches. There are two stages in the experiments. The first stage is to investigate if high accuracy segmentation can make an improvement to Chinese information retrieval. In the second stage, a text mining based query expansion approach is implemented and a further experiment has been done to compare its performance with the standard Rocchio approach with the proposed text mining based query expansion method. The NTCIR5 Chinese collections are used in the experiments. The experiment results show that by incorporating the text mining based query expansion with the hybrid model, significant improvement has been achieved in both precision and recall assessments.