860 resultados para Frequent mining
Resumo:
Matrix decompositions, where a given matrix is represented as a product of two other matrices, are regularly used in data mining. Most matrix decompositions have their roots in linear algebra, but the needs of data mining are not always those of linear algebra. In data mining one needs to have results that are interpretable -- and what is considered interpretable in data mining can be very different to what is considered interpretable in linear algebra. --- The purpose of this thesis is to study matrix decompositions that directly address the issue of interpretability. An example is a decomposition of binary matrices where the factor matrices are assumed to be binary and the matrix multiplication is Boolean. The restriction to binary factor matrices increases interpretability -- factor matrices are of the same type as the original matrix -- and allows the use of Boolean matrix multiplication, which is often more intuitive than normal matrix multiplication with binary matrices. Also several other decomposition methods are described, and the computational complexity of computing them is studied together with the hardness of approximating the related optimization problems. Based on these studies, algorithms for constructing the decompositions are proposed. Constructing the decompositions turns out to be computationally hard, and the proposed algorithms are mostly based on various heuristics. Nevertheless, the algorithms are shown to be capable of finding good results in empirical experiments conducted with both synthetic and real-world data.
Resumo:
Cell transition data is obtained from a cellular phone that switches its current serving cell tower. The data consists of a sequence of transition events, which are pairs of cell identifiers and transition times. The focus of this thesis is applying data mining methods to such data, developing new algorithms, and extracting knowledge that will be a solid foundation on which to build location-aware applications. In addition to a thorough exploration of the features of the data, the tools and methods developed in this thesis provide solutions to three distinct research problems. First, we develop clustering algorithms that produce a reliable mapping between cell transitions and physical locations observed by users of mobile devices. The main clustering algorithm operates in online fashion, and we consider also a number of offline clustering methods for comparison. Second, we define the concept of significant locations, known as bases, and give an online algorithm for determining them. Finally, we consider the task of predicting the movement of the user, based on historical data. We develop a prediction algorithm that considers paths of movement in their entirety, instead of just the most recent movement history. All of the presented methods are evaluated with a significant body of real cell transition data, collected from about one hundred different individuals. The algorithms developed in this thesis are designed to be implemented on a mobile device, and require no extra hardware sensors or network infrastructure. By not relying on external services and keeping the user information as much as possible on the user s own personal device, we avoid privacy issues and let the users control the disclosure of their location information.
Resumo:
Prescribed fire is one of the most widely-used management tools for reducing fuel loads in managed forests. However the long-term effects of repeated prescribed fires on soil carbon (C) and nitrogen (N) pools are poorly understood. This study aimed to investigate how different fire frequency regimes influence C and N pools in the surface soils (0–10 cm). A prescribed fire field experiment in a wet sclerophyll forest established in 1972 in southeast Queensland was used in this study. The fire frequency regimes included long unburnt (NB), burnt every 2 years (2yrB) and burnt every 4 years (4yrB), with four replications. Compared with the NB treatment, the 2yrB treatment lowered soil total C by 44%, total N by 54%, HCl hydrolysable C and N by 48% and 59%, KMnO4 oxidizable C by 81%, microbial biomass C and N by 42% and 33%, cumulative CO2–C by 28%, NaOCl-non-oxidizable C and N by 41% and 51%, and charcoal-C by 17%, respectively. The 4yrB and NB treatments showed no significant differences for these soil C and N pools. All soil labile, biologically active and recalcitrant and total C and N pools were correlated positively with each other and with soil moisture content, but negatively correlated with soil pH. The C:N ratios of different C and N pools were greater in the burned treatments than in the NB treatments. This study has highlighted that the prescribed burning at four year interval is a more sustainable management practice for this subtropical forest ecosystem.
Resumo:
Springsure Creek Coal (SCC) intends to develop a coal mine using the long wall mining process under grain farming land near Emerald in Central Queensland (CQ). While this technology will result in some subsidence of the land surface, SCC wishes to maintain productivity of the grain cropping land in the precinct after coal mining. However, the impact of the surface subsidence resulting from that mining process on productivity of cropping land in any Australian landscape is currently unclear. A research protocol to investigate the impacts of subsidence on grain productivity for when the SCC project becomes operational is proposed. The protocol has wider application for other similar mining projects throughout the country. A copy of the full report is accessible on www.aginstitute.com.au.
Resumo:
Web-based technology is particularly well-suited to promoting active student involvement in the processes of learning. All students enrolled in a first-year educational psychology unit were required to complete ten weekly online quizzes, ten weekly student-generated questions and ten weekly student answers to those questions. Results of an online survey of participating students strongly support the viability and perceived benefits of such an instructional approach. Although students reported that the 30 assessments were useful and reasonable, the most common theme to emerge from the professional reflections of participating lecturers was that the marking of questions and answers was unmanageable.
Resumo:
This thesis increased the researchers understanding of the relationship between operations and maintenance in underground longwall coal mines, using data from a Queensland underground coal mine. The thesis explores various relationships between recorded variables. Issues with human recorded data was uncovered, and results emphasised the significance of variables associated with conveyor operation to explain production.
Resumo:
Herbivorous insects comprise a major part of terrestrial biodiversity, and their interactions with their host plants and natural enemies are of vast ecological importance. A large body of research demonstrates that the ecology and evolution of these insects may be affected by trophic interactions, by abiotic influences, and by intraspecific processes, but so far research on these individual aspects has rarely been combined. This thesis uses the leaf-mining moth Tischeria ekebladella and the pedunculate oak (Quercus robur) as a case study to assess how spatial variation in trophic interactions and the physical distribution of host trees jointly affect the distribution, dynamics and evolution of a host-specific herbivore. With respect to habitat quality, Tischeria ekebladella experiences abundant variation at several spatial scales. Most of this variation occurs at small scales notably among leaves and shoots within individual trees. While hypothetically this could cause moths to evolve an ability to select leaves and shoots of high quality, I did not find any coupling between female preference and offspring performance. Based on my studies on temporal variation in resource quality I therefore propose that unpredictable temporal changes in the relative rankings of individual resource units may render it difficult for females to predict the fate of their developing offspring. With respect to intraspecific processes, my results suggest that limited moth dispersal in relation to the spatial distribution of oak trees plays a key role in determining the regional distribution of Tischeria ekebladella. The distribution of the moth is aggregated at the landscape level, where local leaf miner populations are less likely to be present where oaks are scarce. A modelling exercise based on empirical dispersal estimates revealed that the moth population on Wattkast an island in south-western Finland is spatially structured overall, but that the relative importance of local and regional processes on tree-specific moth dynamics varies drastically across the landscape. To conclude, my work in the oak-Tischeria ekebladella system demonstrates that the local abundance and regional distribution of a herbivore may be more strongly influenced by the spatial location of host trees than by their relative quality. Hence, it reveals the importance of considering spatial context in the study of herbivorous insects, and forms a bridge between the classical fields of plant-insect interactions and spatial ecology.
Resumo:
Multi-document summarization addressing the problem of information overload has been widely utilized in the various real-world applications. Most of existing approaches adopt term-based representation for documents which limit the performance of multi-document summarization systems. In this paper, we proposed a novel pattern-based topic model (PBTMSum) for the task of the multi-document summarization. PBTMSum combining pattern mining techniques with LDA topic modelling could generate discriminative and semantic rich representations for topics and documents so that the most representative and non-redundant sentences can be selected to form a succinct and informative summary. Extensive experiments are conducted on the data of document understanding conference (DUC) 2007. The results prove the effectiveness and efficiency of our proposed approach.
Resumo:
The role of Acidithiobacillus group of bacteria in acid generation and heavy metal dissolution was studied with relevance to some Indian mines. Microorganisms implicated in acid generation such as Acidithiobacillus Acidithicibacillus thiooxidans and Leptospirillum ferrooxidans were isolated from abandoned mines, waste rocks and tailing dumps. Arsenite oxidizing Thiomonas and Bacillus group of bacteria were isolated and their ability to oxidize As (111) to As (V) established. Mine isolated Sulfate reducing bacteria were used to remove dissolved copper, zinc, iron and arsenic from solutions.
Resumo:
With the development of wearable and mobile computing technology, more and more people start using sleep-tracking tools to collect personal sleep data on a daily basis aiming at understanding and improving their sleep. While sleep quality is influenced by many factors in a person’s lifestyle context, such as exercise, diet and steps walked, existing tools simply visualize sleep data per se on a dashboard rather than analyse those data in combination with contextual factors. Hence many people find it difficult to make sense of their sleep data. In this paper, we present a cloud-based intelligent computing system named SleepExplorer that incorporates sleep domain knowledge and association rule mining for automated analysis on personal sleep data in light of contextual factors. Experiments show that the same contextual factors can play a distinct role in sleep of different people, and SleepExplorer could help users discover factors that are most relevant to their personal sleep.
Resumo:
Data mining involves nontrivial process of extracting knowledge or patterns from large databases. Genetic Algorithms are efficient and robust searching and optimization methods that are used in data mining. In this paper we propose a Self-Adaptive Migration Model GA (SAMGA), where parameters of population size, the number of points of crossover and mutation rate for each population are adaptively fixed. Further, the migration of individuals between populations is decided dynamically. This paper gives a mathematical schema analysis of the method stating and showing that the algorithm exploits previously discovered knowledge for a more focused and concentrated search of heuristically high yielding regions while simultaneously performing a highly explorative search on the other regions of the search space. The effective performance of the algorithm is then shown using standard testbed functions and a set of actual classification datamining problems. Michigan style of classifier was used to build the classifier and the system was tested with machine learning databases of Pima Indian Diabetes database, Wisconsin Breast Cancer database and few others. The performance of our algorithm is better than others.
Resumo:
The Social Water Assessment Protocol (SWAP) is a tool consisting of a series of questions on fourteen themes designed to capture the social context of water around a mine site. A pilot study of the SWAP, conducted in Prestea-Huni Valley, Ghana, showed that some communities were concerned about whether the groundwater was potable. The mining company’s concern was that there was a cycle of dependency amongst communities that received treated water from the mining company. The pilot identified potential data sources and stakeholder groups for each theme, gaps in themes and suggested refinements to questions to improve the SWAP.
Resumo:
Many developing countries are experiencing rapid expansion in mining with associated water impacts. In most cases mining expansion is outpacing the building of national capacity to ensure that sustainable water management practices are implemented. Since 2011, Australia's International Mining for Development Centre (IM4DC) has funded capacity building in such countries including a program of water projects. Five projects in particular (principally covering experiences from Peru, Colombia, Ghana, Zambia, Indonesia, Philippines and Mongolia) have provided insight into water capacity building priorities and opportunities. This paper reviews the challenges faced by water stakeholders, and proposes the associated capacity needs. The paper uses the evidence derived from the IM4DC projects to develop a set of specific capacity-building recommendations. Recommendations include: the incorporation of mine water management in engineering and environmental undergraduate courses; secondments of staff to suitable partner organisations; training to allow site staff to effectively monitor water including community impacts; leadership training to support a water stewardship culture; training of officials to support implementation of catchment management approaches; and the empowerment of communities to recognise and negotiate solutions to mine-related risks. New initiatives to fund the transfer of multi-disciplinary knowledge from nations with well-developed water management practices are called for.
Resumo:
Automatic identification of software faults has enormous practical significance. This requires characterizing program execution behavior and the use of appropriate data mining techniques on the chosen representation. In this paper, we use the sequence of system calls to characterize program execution. The data mining tasks addressed are learning to map system call streams to fault labels and automatic identification of fault causes. Spectrum kernels and SVM are used for the former while latent semantic analysis is used for the latter The techniques are demonstrated for the intrusion dataset containing system call traces. The results show that kernel techniques are as accurate as the best available results but are faster by orders of magnitude. We also show that latent semantic indexing is capable of revealing fault-specific features.
Resumo:
"In this study, for the first time, two distinct genetic lineages of Puumala virus (PUUV) were found within a small sampling area and within a single host genetic lineage (Ural mtDNA) at Pallasjarvi, northern Finland. Lung tissue samples of 171 bank voles (Myodes glareolus) trapped in September 1998 were screened for the presence of PUUV nucleocapsid antigen and 25 were found to be positive. Partial sequences of the PUUV small (S), medium (M) and large (L) genome segments were recovered from these samples using RT-PCR. Phylogenetic analysis revealed two genetic groups of PUUV sequences that belonged to the Finnish and north Scandinavian lineages. This presented a unique opportunity to study inter-lineage reassortment in PUUV; indeed, 32% of the studied bank voles appeared to carry reassortant virus genomes. Thus, the frequency of inter-lineage reassortment in PUUV was comparable to that of intra-lineage reassortment observed previously (Razzauti, M., Plyusnina, A., Henttonen, H. & Plyusnin, A. (2008). J Gen Virol 89, 1649-1660). Of six possible reassortant S/M/L combinations, only two were found at Pallasjarvi and, notably, in all reassortants, both S and L segments originated from the same genetic lineage, suggesting a non-random pattern for the reassortment. These findings are discussed in connection to PUUV evolution in Fermoscandia."