370 resultados para computer-based instrumentation
Resumo:
Textual document set has become an important and rapidly growing information source in the web. Text classification is one of the crucial technologies for information organisation and management. Text classification has become more and more important and attracted wide attention of researchers from different research fields. In this paper, many feature selection methods, the implement algorithms and applications of text classification are introduced firstly. However, because there are much noise in the knowledge extracted by current data-mining techniques for text classification, it leads to much uncertainty in the process of text classification which is produced from both the knowledge extraction and knowledge usage, therefore, more innovative techniques and methods are needed to improve the performance of text classification. It has been a critical step with great challenge to further improve the process of knowledge extraction and effectively utilization of the extracted knowledge. Rough Set decision making approach is proposed to use Rough Set decision techniques to more precisely classify the textual documents which are difficult to separate by the classic text classification methods. The purpose of this paper is to give an overview of existing text classification technologies, to demonstrate the Rough Set concepts and the decision making approach based on Rough Set theory for building more reliable and effective text classification framework with higher precision, to set up an innovative evaluation metric named CEI which is very effective for the performance assessment of the similar research, and to propose a promising research direction for addressing the challenging problems in text classification, text mining and other relative fields.
Resumo:
A security system based on the recognition of the iris of human eyes using the wavelet transform is presented. The zero-crossings of the wavelet transform are used to extract the unique features obtained from the grey-level profiles of the iris. The recognition process is performed in two stages. The first stage consists of building a one-dimensional representation of the grey-level profiles of the iris, followed by obtaining the wavelet transform zerocrossings of the resulting representation. The second stage is the matching procedure for iris recognition. The proposed approach uses only a few selected intermediate resolution levels for matching, thus making it computationally efficient as well as less sensitive to noise and quantisation errors. A normalisation process is implemented to compensate for size variations due to the possible changes in the camera-to-face distance. The technique has been tested on real images in both noise-free and noisy conditions. The technique is being investigated for real-time implementation, as a stand-alone system, for access control to high-security areas.
Resumo:
In recent years, the Web 2.0 has provided considerable facilities for people to create, share and exchange information and ideas. Upon this, the user generated content, such as reviews, has exploded. Such data provide a rich source to exploit in order to identify the information associated with specific reviewed items. Opinion mining has been widely used to identify the significant features of items (e.g., cameras) based upon user reviews. Feature extraction is the most critical step to identify useful information from texts. Most existing approaches only find individual features about a product without revealing the structural relationships between the features which usually exist. In this paper, we propose an approach to extract features and feature relationships, represented as a tree structure called feature taxonomy, based on frequent patterns and associations between patterns derived from user reviews. The generated feature taxonomy profiles the product at multiple levels and provides more detailed information about the product. Our experiment results based on some popularly used review datasets show that our proposed approach is able to capture the product features and relations effectively.
Resumo:
Currently, the GNSS computing modes are of two classes: network-based data processing and user receiver-based processing. A GNSS reference receiver station essentially contributes raw measurement data in either the RINEX file format or as real-time data streams in the RTCM format. Very little computation is carried out by the reference station. The existing network-based processing modes, regardless of whether they are executed in real-time or post-processed modes, are centralised or sequential. This paper describes a distributed GNSS computing framework that incorporates three GNSS modes: reference station-based, user receiver-based and network-based data processing. Raw data streams from each GNSS reference receiver station are processed in a distributed manner, i.e., either at the station itself or at a hosting data server/processor, to generate station-based solutions, or reference receiver-specific parameters. These may include precise receiver clock, zenith tropospheric delay, differential code biases, ambiguity parameters, ionospheric delays, as well as line-of-sight information such as azimuth and elevation angles. Covariance information for estimated parameters may also be optionally provided. In such a mode the nearby precise point positioning (PPP) or real-time kinematic (RTK) users can directly use the corrections from all or some of the stations for real-time precise positioning via a data server. At the user receiver, PPP and RTK techniques are unified under the same observation models, and the distinction is how the user receiver software deals with corrections from the reference station solutions and the ambiguity estimation in the observation equations. Numerical tests demonstrate good convergence behaviour for differential code bias and ambiguity estimates derived individually with single reference stations. With station-based solutions from three reference stations within distances of 22–103 km the user receiver positioning results, with various schemes, show an accuracy improvement of the proposed station-augmented PPP and ambiguity-fixed PPP solutions with respect to the standard float PPP solutions without station augmentation and ambiguity resolutions. Overall, the proposed reference station-based GNSS computing mode can support PPP and RTK positioning services as a simpler alternative to the existing network-based RTK or regionally augmented PPP systems.
Resumo:
In this paper, we present WebPut, a prototype system that adopts a novel web-based approach to the data imputation problem. Towards this, Webput utilizes the available information in an incomplete database in conjunction with the data consistency principle. Moreover, WebPut extends effective Information Extraction (IE) methods for the purpose of formulating web search queries that are capable of effectively retrieving missing values with high accuracy. WebPut employs a confidence-based scheme that efficiently leverages our suite of data imputation queries to automatically select the most effective imputation query for each missing value. A greedy iterative algorithm is proposed to schedule the imputation order of the different missing values in a database, and in turn the issuing of their corresponding imputation queries, for improving the accuracy and efficiency of WebPut. Moreover, several optimization techniques are also proposed to reduce the cost of estimating the confidence of imputation queries at both the tuple-level and the database-level. Experiments based on several real-world data collections demonstrate not only the effectiveness of WebPut compared to existing approaches, but also the efficiency of our proposed algorithms and optimization techniques.
Resumo:
Topic modelling has been widely used in the fields of information retrieval, text mining, machine learning, etc. In this paper, we propose a novel model, Pattern Enhanced Topic Model (PETM), which makes improvements to topic modelling by semantically representing topics with discriminative patterns, and also makes innovative contributions to information filtering by utilising the proposed PETM to determine document relevance based on topics distribution and maximum matched patterns proposed in this paper. Extensive experiments are conducted to evaluate the effectiveness of PETM by using the TREC data collection Reuters Corpus Volume 1. The results show that the proposed model significantly outperforms both state-of-the-art term-based models and pattern-based models.
Resumo:
Agent-based modelling (ABM), like other modelling techniques, is used to answer specific questions from real world systems that could otherwise be expensive or impractical. Its recent gain in popularity can be attributed to some degree to its capacity to use information at a fine level of detail of the system, both geographically and temporally, and generate information at a higher level, where emerging patterns can be observed. This technique is data-intensive, as explicit data at a fine level of detail is used and it is computer-intensive as many interactions between agents, which can learn and have a goal, are required. With the growing availability of data and the increase in computer power, these concerns are however fading. Nonetheless, being able to update or extend the model as more information becomes available can become problematic, because of the tight coupling of the agents and their dependence on the data, especially when modelling very large systems. One large system to which ABM is currently applied is the electricity distribution where thousands of agents representing the network and the consumers’ behaviours are interacting with one another. A framework that aims at answering a range of questions regarding the potential evolution of the grid has been developed and is presented here. It uses agent-based modelling to represent the engineering infrastructure of the distribution network and has been built with flexibility and extensibility in mind. What distinguishes the method presented here from the usual ABMs is that this ABM has been developed in a compositional manner. This encompasses not only the software tool, which core is named MODAM (MODular Agent-based Model) but the model itself. Using such approach enables the model to be extended as more information becomes available or modified as the electricity system evolves, leading to an adaptable model. Two well-known modularity principles in the software engineering domain are information hiding and separation of concerns. These principles were used to develop the agent-based model on top of OSGi and Eclipse plugins which have good support for modularity. Information regarding the model entities was separated into a) assets which describe the entities’ physical characteristics, and b) agents which describe their behaviour according to their goal and previous learning experiences. This approach diverges from the traditional approach where both aspects are often conflated. It has many advantages in terms of reusability of one or the other aspect for different purposes as well as composability when building simulations. For example, the way an asset is used on a network can greatly vary while its physical characteristics are the same – this is the case for two identical battery systems which usage will vary depending on the purpose of their installation. While any battery can be described by its physical properties (e.g. capacity, lifetime, and depth of discharge), its behaviour will vary depending on who is using it and what their aim is. The model is populated using data describing both aspects (physical characteristics and behaviour) and can be updated as required depending on what simulation is to be run. For example, data can be used to describe the environment to which the agents respond to – e.g. weather for solar panels, or to describe the assets and their relation to one another – e.g. the network assets. Finally, when running a simulation, MODAM calls on its module manager that coordinates the different plugins, automates the creation of the assets and agents using factories, and schedules their execution which can be done sequentially or in parallel for faster execution. Building agent-based models in this way has proven fast when adding new complex behaviours, as well as new types of assets. Simulations have been run to understand the potential impact of changes on the network in terms of assets (e.g. installation of decentralised generators) or behaviours (e.g. response to different management aims). While this platform has been developed within the context of a project focussing on the electricity domain, the core of the software, MODAM, can be extended to other domains such as transport which is part of future work with the addition of electric vehicles.
Resumo:
Social networking sites (SNSs), with their large numbers of users and large information base, seem to be perfect breeding grounds for exploiting the vulnerabilities of people, the weakest link in security. Deceiving, persuading, or influencing people to provide information or to perform an action that will benefit the attacker is known as “social engineering.” While technology-based security has been addressed by research and may be well understood, social engineering is more challenging to understand and manage, especially in new environments such as SNSs, owing to some factors of SNSs that reduce the ability of users to detect the attack and increase the ability of attackers to launch it. This work will contribute to the knowledge of social engineering by presenting the first two conceptual models of social engineering attacks in SNSs. Phase-based and source-based models are presented, along with an intensive and comprehensive overview of different aspects of social engineering threats in SNSs.
Resumo:
While social engineering represents a real and ominous threat to many organizations, companies, governments, and individuals, social networking sites (SNSs), have been identified as among the most common means of social engineering attacks. Owing to factors that reduce the ability of users to detect social engineering tricks and increase the ability of attackers to launch them, SNSs seem to be perfect breeding ground for exploiting the vulnerabilities of people, and the weakest link in security. This work will contribute to the knowledge of social engineering by identifying different entities and subentities that affect social engineering based attacks in SNSs. Moreover, this paper includes an intensive and comprehensive overview of different aspects of social engineering threats in SNSs.
Resumo:
Many mature term-based or pattern-based approaches have been used in the field of information filtering to generate users’ information needs from a collection of documents. A fundamental assumption for these approaches is that the documents in the collection are all about one topic. However, in reality users’ interests can be diverse and the documents in the collection often involve multiple topics. Topic modelling, such as Latent Dirichlet Allocation (LDA), was proposed to generate statistical models to represent multiple topics in a collection of documents, and this has been widely utilized in the fields of machine learning and information retrieval, etc. But its effectiveness in information filtering has not been so well explored. Patterns are always thought to be more discriminative than single terms for describing documents. However, the enormous amount of discovered patterns hinder them from being effectively and efficiently used in real applications, therefore, selection of the most discriminative and representative patterns from the huge amount of discovered patterns becomes crucial. To deal with the above mentioned limitations and problems, in this paper, a novel information filtering model, Maximum matched Pattern-based Topic Model (MPBTM), is proposed. The main distinctive features of the proposed model include: (1) user information needs are generated in terms of multiple topics; (2) each topic is represented by patterns; (3) patterns are generated from topic models and are organized in terms of their statistical and taxonomic features, and; (4) the most discriminative and representative patterns, called Maximum Matched Patterns, are proposed to estimate the document relevance to the user’s information needs in order to filter out irrelevant documents. Extensive experiments are conducted to evaluate the effectiveness of the proposed model by using the TREC data collection Reuters Corpus Volume 1. The results show that the proposed model significantly outperforms both state-of-the-art term-based models and pattern-based models
Resumo:
Whole image descriptors have recently been shown to be remarkably robust to perceptual change especially compared to local features. However, whole-image-based localization systems typically rely on heuristic methods for determining appropriate matching thresholds in a particular environment. These environment-specific tuning requirements and the lack of a meaningful interpretation of these arbitrary thresholds limits the general applicability of these systems. In this paper we present a Bayesian model of probability for whole-image descriptors that can be seamlessly integrated into localization systems designed for probabilistic visual input. We demonstrate this method using CAT-Graph, an appearance-based visual localization system originally designed for a FAB-MAP-style probabilistic input. We show that using whole-image descriptors as visual input extends CAT-Graph’s functionality to environments that experience a greater amount of perceptual change. We also present a method of estimating whole-image probability models in an online manner, removing the need for a prior training phase. We show that this online, automated training method can perform comparably to pre-trained, manually tuned local descriptor methods.
Resumo:
This paper presents a distributed communication based active power curtailment (APC) control scheme for grid connected photovoltaic (PV) systems to address voltage rise. A simple distribution feeder model is presented and simulated using MATLAB. The resource sharing based control scheme proposed is shown to be effective at reducing voltage rise during times of peak generation and low load. Simulations also show the even distribution of APC using simple communications. Simulations demonstrate the versatility of the proposed control method under major communication failure conditions. Further research may lead to possible applications in coordinated electric vehicle (EV) charging.
Resumo:
A victim of phishing emails could be subjected to money loss and identity theft. This paper investigates the different types of phishing email victims, with the goal of increasing such victims' defences. To obtain this kind of information, an experiment which involves sending a phishing email to participants is conducted. Quantitative and qualitative methods are also used to collect users' information. A model for detecting deception has been employed to understand victims' behaviour. This paper reports the qualitative results. The findings suggest that victims of phishing emails do not always exhibit the same vulnerability. The cause of being a victim is a result of three weaknesses in the detection process: (1) lack of knowledge; (2) weak confirmation channel, and; (3) victims' high propensity towards risk-taking. Therefore, it is suggested that users be provided with suitable confirmation channels and be more risk averse in their behaviour so that they would not fall victim to phishing emails.
Resumo:
Voltage drop at network peak hours is a significant power quality problem in Low Voltage (LV) distribution feeders. Recently, voltage rise due to high penetration of Photovoltaic cells (PVs) has been creating a new power quality problem during noon periods. In this paper, a voltage control strategy is proposed for the household installed PVs to regulate the voltage along the LV feeder. For this purpose, each PV is controlled to exchange reactive power with the grid. A droop control method is utilized to coordinate the reactive power exchange of each PV. The proposed method is a decentralized local voltage support since it is based on only local measurements and does not require any communication with other PVs. The required converter and filter structure and control algorithms are proposed to ensure the dynamic performance of the system. The study focuses on 3-phase PVs. The network is studied at network peak and off-peak periods, separately. The efficacy of the proposed voltage support concept is verified through numerical and dynamic analyses with MATLAB and PSCAD/EMTDC.
Resumo:
The K-means algorithm is one of the most popular techniques in clustering. Nevertheless, the performance of the K-means algorithm depends highly on initial cluster centers and converges to local minima. This paper proposes a hybrid evolutionary programming based clustering algorithm, called PSO-SA, by combining particle swarm optimization (PSO) and simulated annealing (SA). The basic idea is to search around the global solution by SA and to increase the information exchange among particles using a mutation operator to escape local optima. Three datasets, Iris, Wisconsin Breast Cancer, and Ripley’s Glass, have been considered to show the effectiveness of the proposed clustering algorithm in providing optimal clusters. The simulation results show that the PSO-SA clustering algorithm not only has a better response but also converges more quickly than the K-means, PSO, and SA algorithms.