899 resultados para Information Filtering, Pattern Mining, Relevance Feature Discovery, Text Mining
Resumo:
The management and sharing of complex data, information and knowledge is a fundamental and growing concern in the Water and other Industries for a variety of reasons. For example, risks and uncertainties associated with climate, and other changes require knowledge to prepare for a range of future scenarios and potential extreme events. Formal ways in which knowledge can be established and managed can help deliver efficiencies on acquisition, structuring and filtering to provide only the essential aspects of the knowledge really needed. Ontologies are a key technology for this knowledge management. The construction of ontologies is a considerable overhead on any knowledge management programme. Hence current computer science research is investigating generating ontologies automatically from documents using text mining and natural language techniques. As an example of this, results from application of the Text2Onto tool to stakeholder documents for a project on sustainable water cycle management in new developments are presented. It is concluded that by adopting ontological representations sooner, rather than later in an analytical process, decision makers will be able to make better use of highly knowledgeable systems containing automated services to ensure that sustainability considerations are included. © 2010 The authors.
Resumo:
The purpose of this paper is to explain the notion of clustering and a concrete clustering method- agglomerative hierarchical clustering algorithm. It shows how a data mining method like clustering can be applied to the analysis of stocks, traded on the Bulgarian Stock Exchange in order to identify similar temporal behavior of the traded stocks. This problem is solved with the aid of a data mining tool that is called XLMiner™ for Microsoft Excel Office.
Resumo:
The principles of design of information-analytical system (IAS) intended for design of new inorganic compounds are considered. IAS includes the integrated system of databases on properties of inorganic substances and materials, the system of the programs of pattern recognition, the knowledge base and managing program. IAS allows a prediction of inorganic compounds not yet synthesized and estimation of their some properties.
Resumo:
This paper introduces an encoding of knowledge representation statements as regular languages and proposes a two-phase approach to processing of explicitly declared conceptual information. The idea is presented for the simple conceptual graphs where conceptual pattern search is implemented by the so called projection operation. Projection calculations are organised into off-line preprocessing and run-time computations. This enables fast run-time treatment of NP-complete problems, given that the intermediate results of the off-line phase are kept in suitable data structures. The experiments with randomly-generated, middle-size knowledge bases support the claim that the suggested approach radically improves the run-time conceptual pattern search.
Resumo:
In this work the new pattern recognition method based on the unification of algebraic and statistical approaches is described. The main point of the method is the voting procedure upon the statistically weighted regularities, which are linear separators in two-dimensional projections of feature space. The report contains brief description of the theoretical foundations of the method, description of its software realization and the results of series of experiments proving its usefulness in practical tasks.
Resumo:
The paper introduces a method for dependencies discovery during human-machine interaction. It is based on an analysis of numerical data sets in knowledge-poor environments. The driven procedures are independent and they interact on a competitive principle. The research focuses on seven of them. The application is in Number Theory.
Resumo:
This paper presents a new, dynamic feature representation method for high value parts consisting of complex and intersecting features. The method first extracts features from the CAD model of a complex part. Then the dynamic status of each feature is established between various operations to be carried out during the whole manufacturing process. Each manufacturing and verification operation can be planned and optimized using the real conditions of a feature, thus enhancing accuracy, traceability and process control. The dynamic feature representation is complementary to the design models used as underlining basis in current CAD/CAM and decision support systems. © 2012 CIRP.
Resumo:
In this demo the basic text mining technologies by using RapidMining have been reviewed. RapidMining basic characteristics and operators of text mining have been described. Text mining example by using Navie Bayes algorithm and process modeling have been revealed.
Resumo:
Today, databases have become an integral part of information systems. In the past two decades, we have seen different database systems being developed independently and used in different applications domains. Today's interconnected networks and advanced applications, such as data warehousing, data mining & knowledge discovery and intelligent data access to information on the Web, have created a need for integrated access to such heterogeneous, autonomous, distributed database systems. Heterogeneous/multidatabase research has focused on this issue resulting in many different approaches. However, a single, generally accepted methodology in academia or industry has not emerged providing ubiquitous intelligent data access from heterogeneous, autonomous, distributed information sources. ^ This thesis describes a heterogeneous database system being developed at High-performance Database Research Center (HPDRC). A major impediment to ubiquitous deployment of multidatabase technology is the difficulty in resolving semantic heterogeneity. That is, identifying related information sources for integration and querying purposes. Our approach considers the semantics of the meta-data constructs in resolving this issue. The major contributions of the thesis work include: (i) providing a scalable, easy-to-implement architecture for developing a heterogeneous multidatabase system, utilizing Semantic Binary Object-oriented Data Model (Sem-ODM) and Semantic SQL query language to capture the semantics of the data sources being integrated and to provide an easy-to-use query facility; (ii) a methodology for semantic heterogeneity resolution by investigating into the extents of the meta-data constructs of component schemas. This methodology is shown to be correct, complete and unambiguous; (iii) a semi-automated technique for identifying semantic relations, which is the basis of semantic knowledge for integration and querying, using shared ontologies for context-mediation; (iv) resolutions for schematic conflicts and a language for defining global views from a set of component Sem-ODM schemas; (v) design of a knowledge base for storing and manipulating meta-data and knowledge acquired during the integration process. This knowledge base acts as the interface between integration and query processing modules; (vi) techniques for Semantic SQL query processing and optimization based on semantic knowledge in a heterogeneous database environment; and (vii) a framework for intelligent computing and communication on the Internet applying the concepts of our work. ^
Resumo:
The main challenges of multimedia data retrieval lie in the effective mapping between low-level features and high-level concepts, and in the individual users' subjective perceptions of multimedia content. ^ The objectives of this dissertation are to develop an integrated multimedia indexing and retrieval framework with the aim to bridge the gap between semantic concepts and low-level features. To achieve this goal, a set of core techniques have been developed, including image segmentation, content-based image retrieval, object tracking, video indexing, and video event detection. These core techniques are integrated in a systematic way to enable the semantic search for images/videos, and can be tailored to solve the problems in other multimedia related domains. In image retrieval, two new methods of bridging the semantic gap are proposed: (1) for general content-based image retrieval, a stochastic mechanism is utilized to enable the long-term learning of high-level concepts from a set of training data, such as user access frequencies and access patterns of images. (2) In addition to whole-image retrieval, a novel multiple instance learning framework is proposed for object-based image retrieval, by which a user is allowed to more effectively search for images that contain multiple objects of interest. An enhanced image segmentation algorithm is developed to extract the object information from images. This segmentation algorithm is further used in video indexing and retrieval, by which a robust video shot/scene segmentation method is developed based on low-level visual feature comparison, object tracking, and audio analysis. Based on shot boundaries, a novel data mining framework is further proposed to detect events in soccer videos, while fully utilizing the multi-modality features and object information obtained through video shot/scene detection. ^ Another contribution of this dissertation is the potential of the above techniques to be tailored and applied to other multimedia applications. This is demonstrated by their utilization in traffic video surveillance applications. The enhanced image segmentation algorithm, coupled with an adaptive background learning algorithm, improves the performance of vehicle identification. A sophisticated object tracking algorithm is proposed to track individual vehicles, while the spatial and temporal relationships of vehicle objects are modeled by an abstract semantic model. ^
Resumo:
If we classify variables in a program into various security levels, then a secure information flow analysis aims to verify statically that information in a program can flow only in ways consistent with the specified security levels. One well-studied approach is to formulate the rules of the secure information flow analysis as a type system. A major trend of recent research focuses on how to accommodate various sophisticated modern language features. However, this approach often leads to overly complicated and restrictive type systems, making them unfit for practical use. Also, problems essential to practical use, such as type inference and error reporting, have received little attention. This dissertation identified and solved major theoretical and practical hurdles to the application of secure information flow. ^ We adopted a minimalist approach to designing our language to ensure a simple lenient type system. We started out with a small simple imperative language and only added features that we deemed most important for practical use. One language feature we addressed is arrays. Due to the various leaking channels associated with array operations, arrays have received complicated and restrictive typing rules in other secure languages. We presented a novel approach for lenient array operations, which lead to simple and lenient typing of arrays. ^ Type inference is necessary because usually a user is only concerned with the security types for input/output variables of a program and would like to have all types for auxiliary variables inferred automatically. We presented a type inference algorithm B and proved its soundness and completeness. Moreover, algorithm B stays close to the program and the type system and therefore facilitates informative error reporting that is generated in a cascading fashion. Algorithm B and error reporting have been implemented and tested. ^ Lastly, we presented a novel framework for developing applications that ensure user information privacy. In this framework, core computations are defined as code modules that involve input/output data from multiple parties. Incrementally, secure flow policies are refined based on feedback from the type checking/inference. Core computations only interact with code modules from involved parties through well-defined interfaces. All code modules are digitally signed to ensure their authenticity and integrity. ^
Resumo:
An Automatic Vehicle Location (AVL) system is a computer-based vehicle tracking system that is capable of determining a vehicle's location in real time. As a major technology of the Advanced Public Transportation System (APTS), AVL systems have been widely deployed by transit agencies for purposes such as real-time operation monitoring, computer-aided dispatching, and arrival time prediction. AVL systems make a large amount of transit performance data available that are valuable for transit performance management and planning purposes. However, the difficulties of extracting useful information from the huge spatial-temporal database have hindered off-line applications of the AVL data. ^ In this study, a data mining process, including data integration, cluster analysis, and multiple regression, is proposed. The AVL-generated data are first integrated into a Geographic Information System (GIS) platform. The model-based cluster method is employed to investigate the spatial and temporal patterns of transit travel speeds, which may be easily translated into travel time. The transit speed variations along the route segments are identified. Transit service periods such as morning peak, mid-day, afternoon peak, and evening periods are determined based on analyses of transit travel speed variations for different times of day. The seasonal patterns of transit performance are investigated by using the analysis of variance (ANOVA). Travel speed models based on the clustered time-of-day intervals are developed using important factors identified as having significant effects on speed for different time-of-day periods. ^ It has been found that transit performance varied from different seasons and different time-of-day periods. The geographic location of a transit route segment also plays a role in the variation of the transit performance. The results of this research indicate that advanced data mining techniques have good potential in providing automated techniques of assisting transit agencies in service planning, scheduling, and operations control. ^
Resumo:
The nation's freeway systems are becoming increasingly congested. A major contribution to traffic congestion on freeways is due to traffic incidents. Traffic incidents are non-recurring events such as accidents or stranded vehicles that cause a temporary roadway capacity reduction, and they can account for as much as 60 percent of all traffic congestion on freeways. One major freeway incident management strategy involves diverting traffic to avoid incident locations by relaying timely information through Intelligent Transportation Systems (ITS) devices such as dynamic message signs or real-time traveler information systems. The decision to divert traffic depends foremost on the expected duration of an incident, which is difficult to predict. In addition, the duration of an incident is affected by many contributing factors. Determining and understanding these factors can help the process of identifying and developing better strategies to reduce incident durations and alleviate traffic congestion. A number of research studies have attempted to develop models to predict incident durations, yet with limited success. ^ This dissertation research attempts to improve on this previous effort by applying data mining techniques to a comprehensive incident database maintained by the District 4 ITS Office of the Florida Department of Transportation (FDOT). Two categories of incident duration prediction models were developed: "offline" models designed for use in the performance evaluation of incident management programs, and "online" models for real-time prediction of incident duration to aid in the decision making of traffic diversion in the event of an ongoing incident. Multiple data mining analysis techniques were applied and evaluated in the research. The multiple linear regression analysis and decision tree based method were applied to develop the offline models, and the rule-based method and a tree algorithm called M5P were used to develop the online models. ^ The results show that the models in general can achieve high prediction accuracy within acceptable time intervals of the actual durations. The research also identifies some new contributing factors that have not been examined in past studies. As part of the research effort, software code was developed to implement the models in the existing software system of District 4 FDOT for actual applications. ^
Resumo:
A number of patterning methods including conventional photo-lithography and E-beam lithography have been employed to pattern devices with critical dimensions of submicrometer levels. The methods of device fabrication by lithography and multilevel processing are usually specific to the chemical and physical properties of the etchants and materials used, and require a number of processing steps. As an alternative, focused ion beam (FIB) lithography is a unique and straightforward tool to rapidly develop nanomagnetic prototyping devices. This feature of FIB is critical to conduct the basic study necessary to advance the state-of-the-art in magnetic recording. ^ The dissertation develops a specific design of nanodevices and demonstrates FIB-fabricated stable and reproducible magnetic nanostructures with a critical dimension of about 10 nm. The project included the fabrication of a patterned single and multilayer magnetic media with areal densities beyond 10 Terabit/in 2. Each block had perpendicular or longitudinal magnetic anisotropy and a single domain structure. The purpose was to demonstrate how the ability of FIB to directly etch nanoscale patterns allowed exploring (even in the academic environment) the true physics of various types of nanostructures. ^ Another goal of this study was the investigation of FIB patterned magnetic media with a set of characterization tools: e.g. Spinstand Guzik V2002, magnetic force microscopy, scanning electron microscopy with energy dispersive system and wavelength dispersive system. ^ In the course of this work, a unique prototype of a record high density patterned magnetic media device capable of 10 terabit/in 2 was built. The read/write testing was performed by a Guzik spinstand. The readback signals were recorded and analyzed by a digital oscilloscope. A number of different configurations for writing and reading information from a magnetic medium were explored. The prototype transducers for this work were fabricated via FIB trimming of different magnetic recording heads. ^
Resumo:
Recent advances in airborne Light Detection and Ranging (LIDAR) technology allow rapid and inexpensive measurements of topography over large areas. Airborne LIDAR systems usually return a 3-dimensional cloud of point measurements from reflective objects scanned by the laser beneath the flight path. This technology is becoming a primary method for extracting information of different kinds of geometrical objects, such as high-resolution digital terrain models (DTMs), buildings and trees, etc. In the past decade, LIDAR gets more and more interest from researchers in the field of remote sensing and GIS. Compared to the traditional data sources, such as aerial photography and satellite images, LIDAR measurements are not influenced by sun shadow and relief displacement. However, voluminous data pose a new challenge for automated extraction the geometrical information from LIDAR measurements because many raster image processing techniques cannot be directly applied to irregularly spaced LIDAR points. ^ In this dissertation, a framework is proposed to filter out information about different kinds of geometrical objects, such as terrain and buildings from LIDAR automatically. They are essential to numerous applications such as flood modeling, landslide prediction and hurricane animation. The framework consists of several intuitive algorithms. Firstly, a progressive morphological filter was developed to detect non-ground LIDAR measurements. By gradually increasing the window size and elevation difference threshold of the filter, the measurements of vehicles, vegetation, and buildings are removed, while ground data are preserved. Then, building measurements are identified from no-ground measurements using a region growing algorithm based on the plane-fitting technique. Raw footprints for segmented building measurements are derived by connecting boundary points and are further simplified and adjusted by several proposed operations to remove noise, which is caused by irregularly spaced LIDAR measurements. To reconstruct 3D building models, the raw 2D topology of each building is first extracted and then further adjusted. Since the adjusting operations for simple building models do not work well on 2D topology, 2D snake algorithm is proposed to adjust 2D topology. The 2D snake algorithm consists of newly defined energy functions for topology adjusting and a linear algorithm to find the minimal energy value of 2D snake problems. Data sets from urbanized areas including large institutional, commercial, and small residential buildings were employed to test the proposed framework. The results demonstrated that the proposed framework achieves a very good performance. ^