168 resultados para mining concession contract


Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper introduces an incremental FP-Growth approach for Web content based data mining and its application in solving a real world problem The problem is solved in the following ways. Firstly, we obtain the semi-structured data from the Web pages of Chinese car market and structure them and save them in local database. Secondly, we use an incremental FP-Growth algorithm for mining association rules to discover Chinese consumers' car consumption preference. To find more general regularities, an attribute-oriented induction method is also utilized to find customer's consumption preference among a range of car categories. Experimental results have revealed some interesting consumption preferences that are useful for the decision makers to make the policy to encourage and guide car consumption. Although the current data we used may not be the best representative of the actual market in practice, it is still good enough for the decision making purpose in terms of reflecting the real situation of car consumption preference under the two assumptions in the context.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Automating Software Engineering is the dream of software Engineers for decades. To make this dream to come to true, data mining can play an important role. Our recent research has shown that to increase the productivity and to reduce the cost of software development, it is essential to have an effective and efficient mechanism to store, manage and utilize existing software resources, and thus to automate software analysis, testing, evaluation and to make use of existing software for new problems. This paper firstly provides a brief overview of traditional data mining followed by a presentation on data mining in broader sense. Secondly, it presents the idea and the technology of software warehouse as an innovative approach in managing software resources using the idea of data warehouse where software assets are systematically accumulated, deposited, retrieved, packaged, managed and utilized driven by data mining and OLAP technologies. Thirdly, we presented the concepts and technology and their applications of data mining and data matrix including software warehouse to software engineering. The perspectives of the role of software warehouse and software mining in modern software development are addressed. We expect that the results will lead to a streamlined high efficient software development process and enhance the productivity in response to modern challenges of the design and development of software applications.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The approaches proposed in the past for discovering sequential patterns mainly focused on single sequential data. In the real world, however, some sequential patterns hide their essences among multi-sequential event data. It has been noted that knowledge discovery with either user-specified constraints, or templates, or skeletons is receiving wide attention because it is more efficient and avoids the tedious selection of useful patterns from the mass-produced results. In this paper, a novel pattern in multi-sequential event data that are correlated and its mining approach are presented. We call this pattern sequential causal pattern. A group of skeletons of sequential causal patterns, which may be specified by the user or generated by the program, are verified or mined by embedding them into the mining engine. Experiments show that this method, when applied to discovering the occurring regularities of a crop pest in a region, is successful in mining sequential causal patterns with user-specified skeletons in multi-sequential event data.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The development of the Internet has boosted prosperity of the World Wide Web, which is now a huge information source. Because of characteristics of the web, in most cases, traditional databasebased technologies are no longer suitable for web information retrieval and management. To effectively manage web information, it is necessary to reveal intrinsic relationships/structures among web information objects by eliminating noise factors. This paper proposes a mechanism that could be widely used in information processing, including web information processing and noise factor elimination for getting more intrinsic relationships. As an application case of this mechanism, one relevant web page finding algorithm is proposed to uncover intrinsic relationship among web pages from their hyperlink patterns, and find more semantic relevant web pages. The experimental evaluation shows the feasibility and effectiveness of the algorithm and demonstrates the potential of the proposed mechanism in web applications.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Data mining refers to extracting or "mining" knowledge from large amounts of data. It is an increasingly popular field that uses statistical, visualization, machine learning, and other data manipulation and knowledge extraction techniques aimed at gaining an insight into the relationships and patterns hidden in the data. Availability of digital data within picture archiving and communication systems raises a possibility of health care and research enhancement associated with manipulation, processing and handling of data by computers.That is the basis for computer-assisted radiology development. Further development of computer-assisted radiology is associated with the use of new intelligent capabilities such as multimedia support and data mining in order to discover the relevant knowledge for diagnosis. It is very useful if results of data mining can be communicated to humans in an understandable way. In this paper, we present our work on data mining in medical image archiving systems. We investigate the use of a very efficient data mining technique, a decision tree, in order to learn the knowledge for computer-assisted image analysis. We apply our method to the classification of x-ray images for lung cancer diagnosis. The proposed technique is based on an inductive decision tree learning algorithm that has low complexity with high transparency and accuracy. The results show that the proposed algorithm is robust, accurate, fast, and it produces a comprehensible structure, summarizing the knowledge it induces.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper follows How (2000) who examined 130 Australian mining and energy initial public offerings (IPOs) from 1979 to 1990 to report an average 107.18 % underpricing return by those IPOs. This study updates that report by investigating 127 Australian mining and energy IPOs from 1994 to 2001 to find a substantially lower 17.93 % average first day return. These updated findings have implications for both new companies seeking to float and also for the subscribers wishing to invest in these new listings.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The mining and energy sectors are particularly publicly sensitive sectors and subject to a high degree of public scrutiny. Evan and Freeman (1993) suggest that such public scrutiny needs may be better met by having direct public stakeholder representation on the board of directors. Similarly, Bilimoria (2000) argues a strong commercial case for engaging women on boards. This paper investigates the number and proportion of non equity holding public stakeholder directors and the number and proportion of women directors on the boards of Australian mining and energy company initial public offerings (IPOs) and reports a paucity of public stakeholder directors and also a low proportional female representation on such IPO boards.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In text categorization applications, class imbalance, which refers to an uneven data distribution where one class is represented by far more less instances than the others, is a commonly encountered problem. In such a situation, conventional classifiers tend to have a strong performance bias, which results in high accuracy rate on the majority class but very low rate on the minorities. An extreme strategy for unbalanced, learning is to discard the majority instances and apply one-class classification to the minority class. However, this could easily cause another type of bias, which increases the accuracy rate on minorities by sacrificing the majorities. This paper aims to investigate approaches that reduce these two types of performance bias and improve the reliability of discovered classification rules. Experimental results show that the inexact field learning method and parameter optimized one-class classifiers achieve more balanced performance than the standard approaches.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Automated adversarial detection systems can fail when under attack by adversaries. As part of a resilient data stream mining system to reduce the possibility of such failure, adaptive spike detection is attribute ranking and selection without class-labels. The first part of adaptive spike detection requires weighing all attributes for spiky-ness to rank them. The second part involves filtering some attributes with extreme weights to choose the best ones for computing each example’s suspicion score. Within an identity crime detection domain, adaptive spike detection is validated on a few million real credit applications with adversarial activity. The results are F-measure curves on eleven experiments and relative weights discussion on the best experiment. The results reinforce adaptive spike detection’s effectiveness for class-label-free attribute ranking and selection.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Traditional approaches such as theorem proving and model checking have been successfully used to analyze security protocols. Ideally, they assume the data communication is reliable and require the user to predetermine authentication goals. However, missing and inconsistent data have been greatly ignored, and the increasingly complicated security protocol makes it difficult to predefine such goals. This paper presents a novel approach to analyze security protocols using association rule mining. It is able to not only validate the reliability of transactions but also discover potential correlations between secure messages. The algorithm and experiment demonstrate that our approaches are useful and promising.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper tells a story of synergism of two cutting edge technologies — agents and data mining. By integrating these two technologies, the power for each of them is enhanced. Integrating agents into data mining systems, or constructing data mining systems from agent perspectives, the flexibility of data mining systems can be greatly improved. New data mining techniques can add to the systems dynamically in the form of agents, while the out-of-date ones can also be deleted from systems at run-time. Equipping agents with data mining capabilities, the agents are much smarter and more adaptable. In this way, the performance of these agent systems can be improved. A new way to integrate these two techniques –ontology-based integration is also discussed. Case studies will be given to demonstrate such mutual enhancement.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In data stream applications, a good approximation obtained in a timely  manner is often better than the exact answer that’s delayed beyond the window of opportunity. Of course, the quality of the approximate is as important as its timely delivery. Unfortunately, algorithms capable of online processing do not conform strictly to a precise error guarantee. Since online processing is essential and so is the precision of the error, it is necessary that stream algorithms meet both criteria. Yet, this is not the case for mining frequent sets in data streams. We present EStream, a novel algorithm that allows online processing while producing results strictly within the error bound. Our theoretical and experimental results show that EStream is a better candidate for finding frequent sets in data streams, when both constraints need to be satisfied.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Most algorithms that focus on discovering frequent patterns from data streams assumed that the machinery is capable of managing all the incoming transactions without any delay; or without the need to drop transactions. However, this assumption is often impractical due to the inherent characteristics of data stream environments. Especially under high load conditions, there is often a shortage of system resources to process the incoming transactions. This causes unwanted latencies that in turn, affects the applicability of the data mining models produced – which often has a small window of opportunity. We propose a load shedding algorithm to address this issue. The algorithm adaptively detects overload situations and drops transactions from data streams using a probabilistic model. We tested our algorithm on both synthetic and real-life datasets to verify the feasibility of our algorithm.