994 resultados para Mega-mining


Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper tells a story of synergism of two cutting edge technologies — agents and data mining. By integrating these two technologies, the power for each of them is enhanced. Integrating agents into data mining systems, or constructing data mining systems from agent perspectives, the flexibility of data mining systems can be greatly improved. New data mining techniques can add to the systems dynamically in the form of agents, while the out-of-date ones can also be deleted from systems at run-time. Equipping agents with data mining capabilities, the agents are much smarter and more adaptable. In this way, the performance of these agent systems can be improved. A new way to integrate these two techniques –ontology-based integration is also discussed. Case studies will be given to demonstrate such mutual enhancement.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In data stream applications, a good approximation obtained in a timely  manner is often better than the exact answer that’s delayed beyond the window of opportunity. Of course, the quality of the approximate is as important as its timely delivery. Unfortunately, algorithms capable of online processing do not conform strictly to a precise error guarantee. Since online processing is essential and so is the precision of the error, it is necessary that stream algorithms meet both criteria. Yet, this is not the case for mining frequent sets in data streams. We present EStream, a novel algorithm that allows online processing while producing results strictly within the error bound. Our theoretical and experimental results show that EStream is a better candidate for finding frequent sets in data streams, when both constraints need to be satisfied.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Most algorithms that focus on discovering frequent patterns from data streams assumed that the machinery is capable of managing all the incoming transactions without any delay; or without the need to drop transactions. However, this assumption is often impractical due to the inherent characteristics of data stream environments. Especially under high load conditions, there is often a shortage of system resources to process the incoming transactions. This causes unwanted latencies that in turn, affects the applicability of the data mining models produced – which often has a small window of opportunity. We propose a load shedding algorithm to address this issue. The algorithm adaptively detects overload situations and drops transactions from data streams using a probabilistic model. We tested our algorithm on both synthetic and real-life datasets to verify the feasibility of our algorithm.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper, we propose a model for discovering frequent sequential patterns, phrases, which can be used as profile descriptors of documents. It is indubitable that we can obtain numerous phrases using data mining algorithms. However, it is difficult to use these phrases effectively for answering what users want. Therefore, we present a pattern taxonomy extraction model which performs the task of extracting descriptive frequent sequential patterns by pruning the meaningless ones. The model then is extended and tested by applying it to the information filtering system. The results of the experiment show that pattern-based methods outperform the keyword-based methods. The results also indicate that removal of meaningless patterns not only reduces the cost of computation but also improves the effectiveness of the system.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Data mining is playing an important role in decision making for business activities and governmental administration. Since many organizations or their divisions do not possess the in-house expertise and infrastructure for data mining, it is beneficial to delegate data mining tasks to external service providers. However, the organizations or divisions may lose of private information during the delegating process. In this paper, we present a Bloom filter based solution to enable organizations or their divisions to delegate the tasks of mining association rules while protecting data privacy. Our approach can achieve high precision in data mining by only trading-off storage requirements, instead of by trading-off the level of privacy preserving.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This study is motivated by How [How, J., 2000. The initial and long run performances of mining IPOs in Australia. Aust. J. Manage. 25, 95–118] who examined 100 Australian gold mining initial public offerings (IPOs) from 1979 to 1990 to report an average 119.51% underpricing return by those IPOs. This study updates that analysis by investigating 114 Australian gold mining IPOs from 1994 to 2004 and finds a significantly lower 13.3% average first day return. Options offered to underwriters can in part explain these returns as can the change in either the Gold Index or the All Ordinaries Index from the date of the prospectus to the date of listing.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Arsenic is a proven carcinogen often found at high concentrations in association with gold and other heavy metals. The freshwater yabby, Cherax destructor Clark (Decapoda, Parastacidae), is a ubiquitous species native to Australia's central and eastern regions, with a growing international commercial market. However, in this region of Australia, yabby farmers often harvest organisms from old mine tailings dams with elevated environmental arsenic levels. Yabbies exposed to elevated environmental arsenic were found to accumulate and store as much as 100 μg/g arsenic in their tissues. The accumulation is proportional to the concentration of arsenic in the sediment and is high enough to be of concern for people who eat the yabbies. A comparison of arsenic levels in wild and lab-fed animals also was performed. Although there was no significant difference in the level of arsenic in the various organs of the wild animals, the animals purchased from a yabby farm showed a significantly higher arsenic concentration in their hepatopancreas (3.7 ± 0.9 μg/g) compared to other organs (0.6–1.8 μg/g). Furthermore, after a 40-d exposure to food containing 200 to 300 μg/g inorganic arsenic, arsenate (As[V])-exposed animals showed a significant increase in tissue-specific arsenic accumulation, whereas arsenite (As[III])-exposed animals showed a lower, nonsignificant increase in As uptake, primarily in the hepatopancreas. These results have important implications for yabby growers and consumers alike.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

For most data stream applications, the volume of data is too huge to be stored in permanent devices or to be thoroughly scanned more than once. It is hence recognized that approximate answers are usually sufficient, where a good approximation obtained in a timely manner is often better than the exact answer that is delayed beyond the window of opportunity. Unfortunately, this is not the case for mining frequent patterns over data streams where algorithms capable of online processing data streams do not conform strictly to a precise error guarantee. Since the quality of approximate answers is as important as their timely delivery, it is necessary to design algorithms to meet both criteria at the same time. In this paper, we propose an algorithm that allows online processing of streaming data and yet guaranteeing the support error of frequent patterns strictly within a user-specified threshold. Our theoretical and experimental studies show that our algorithm is an effective and reliable method for finding frequent sets in data stream environments when both constraints need to be satisfied.