861 resultados para Mega-mining
Resumo:
Matrix decompositions, where a given matrix is represented as a product of two other matrices, are regularly used in data mining. Most matrix decompositions have their roots in linear algebra, but the needs of data mining are not always those of linear algebra. In data mining one needs to have results that are interpretable -- and what is considered interpretable in data mining can be very different to what is considered interpretable in linear algebra. --- The purpose of this thesis is to study matrix decompositions that directly address the issue of interpretability. An example is a decomposition of binary matrices where the factor matrices are assumed to be binary and the matrix multiplication is Boolean. The restriction to binary factor matrices increases interpretability -- factor matrices are of the same type as the original matrix -- and allows the use of Boolean matrix multiplication, which is often more intuitive than normal matrix multiplication with binary matrices. Also several other decomposition methods are described, and the computational complexity of computing them is studied together with the hardness of approximating the related optimization problems. Based on these studies, algorithms for constructing the decompositions are proposed. Constructing the decompositions turns out to be computationally hard, and the proposed algorithms are mostly based on various heuristics. Nevertheless, the algorithms are shown to be capable of finding good results in empirical experiments conducted with both synthetic and real-world data.
Resumo:
Cell transition data is obtained from a cellular phone that switches its current serving cell tower. The data consists of a sequence of transition events, which are pairs of cell identifiers and transition times. The focus of this thesis is applying data mining methods to such data, developing new algorithms, and extracting knowledge that will be a solid foundation on which to build location-aware applications. In addition to a thorough exploration of the features of the data, the tools and methods developed in this thesis provide solutions to three distinct research problems. First, we develop clustering algorithms that produce a reliable mapping between cell transitions and physical locations observed by users of mobile devices. The main clustering algorithm operates in online fashion, and we consider also a number of offline clustering methods for comparison. Second, we define the concept of significant locations, known as bases, and give an online algorithm for determining them. Finally, we consider the task of predicting the movement of the user, based on historical data. We develop a prediction algorithm that considers paths of movement in their entirety, instead of just the most recent movement history. All of the presented methods are evaluated with a significant body of real cell transition data, collected from about one hundred different individuals. The algorithms developed in this thesis are designed to be implemented on a mobile device, and require no extra hardware sensors or network infrastructure. By not relying on external services and keeping the user information as much as possible on the user s own personal device, we avoid privacy issues and let the users control the disclosure of their location information.
Resumo:
Springsure Creek Coal (SCC) intends to develop a coal mine using the long wall mining process under grain farming land near Emerald in Central Queensland (CQ). While this technology will result in some subsidence of the land surface, SCC wishes to maintain productivity of the grain cropping land in the precinct after coal mining. However, the impact of the surface subsidence resulting from that mining process on productivity of cropping land in any Australian landscape is currently unclear. A research protocol to investigate the impacts of subsidence on grain productivity for when the SCC project becomes operational is proposed. The protocol has wider application for other similar mining projects throughout the country. A copy of the full report is accessible on www.aginstitute.com.au.
Resumo:
This thesis increased the researchers understanding of the relationship between operations and maintenance in underground longwall coal mines, using data from a Queensland underground coal mine. The thesis explores various relationships between recorded variables. Issues with human recorded data was uncovered, and results emphasised the significance of variables associated with conveyor operation to explain production.
Resumo:
Herbivorous insects comprise a major part of terrestrial biodiversity, and their interactions with their host plants and natural enemies are of vast ecological importance. A large body of research demonstrates that the ecology and evolution of these insects may be affected by trophic interactions, by abiotic influences, and by intraspecific processes, but so far research on these individual aspects has rarely been combined. This thesis uses the leaf-mining moth Tischeria ekebladella and the pedunculate oak (Quercus robur) as a case study to assess how spatial variation in trophic interactions and the physical distribution of host trees jointly affect the distribution, dynamics and evolution of a host-specific herbivore. With respect to habitat quality, Tischeria ekebladella experiences abundant variation at several spatial scales. Most of this variation occurs at small scales notably among leaves and shoots within individual trees. While hypothetically this could cause moths to evolve an ability to select leaves and shoots of high quality, I did not find any coupling between female preference and offspring performance. Based on my studies on temporal variation in resource quality I therefore propose that unpredictable temporal changes in the relative rankings of individual resource units may render it difficult for females to predict the fate of their developing offspring. With respect to intraspecific processes, my results suggest that limited moth dispersal in relation to the spatial distribution of oak trees plays a key role in determining the regional distribution of Tischeria ekebladella. The distribution of the moth is aggregated at the landscape level, where local leaf miner populations are less likely to be present where oaks are scarce. A modelling exercise based on empirical dispersal estimates revealed that the moth population on Wattkast an island in south-western Finland is spatially structured overall, but that the relative importance of local and regional processes on tree-specific moth dynamics varies drastically across the landscape. To conclude, my work in the oak-Tischeria ekebladella system demonstrates that the local abundance and regional distribution of a herbivore may be more strongly influenced by the spatial location of host trees than by their relative quality. Hence, it reveals the importance of considering spatial context in the study of herbivorous insects, and forms a bridge between the classical fields of plant-insect interactions and spatial ecology.
Resumo:
Multi-document summarization addressing the problem of information overload has been widely utilized in the various real-world applications. Most of existing approaches adopt term-based representation for documents which limit the performance of multi-document summarization systems. In this paper, we proposed a novel pattern-based topic model (PBTMSum) for the task of the multi-document summarization. PBTMSum combining pattern mining techniques with LDA topic modelling could generate discriminative and semantic rich representations for topics and documents so that the most representative and non-redundant sentences can be selected to form a succinct and informative summary. Extensive experiments are conducted on the data of document understanding conference (DUC) 2007. The results prove the effectiveness and efficiency of our proposed approach.
Resumo:
Understanding the functioning of a neural system in terms of its underlying circuitry is an important problem in neuroscience. Recent d evelopments in electrophysiology and imaging allow one to simultaneously record activities of hundreds of neurons. Inferring the underlying neuronal connectivity patterns from such multi-neuronal spike train data streams is a challenging statistical and computational problem. This task involves finding significant temporal patterns from vast amounts of symbolic time series data. In this paper we show that the frequent episode mining methods from the field of temporal data mining can be very useful in this context. In the frequent episode discovery framework, the data is viewed as a sequence of events, each of which is characterized by an event type and its time of occurrence and episodes are certain types of temporal patterns in such data. Here we show that, using the set of discovered frequent episodes from multi-neuronal data, one can infer different types of connectivity patterns in the neural system that generated it. For this purpose, we introduce the notion of mining for frequent episodes under certain temporal constraints; the structure of these temporal constraints is motivated by the application. We present algorithms for discovering serial and parallel episodes under these temporal constraints. Through extensive simulation studies we demonstrate that these methods are useful for unearthing patterns of neuronal network connectivity.
Resumo:
The role of Acidithiobacillus group of bacteria in acid generation and heavy metal dissolution was studied with relevance to some Indian mines. Microorganisms implicated in acid generation such as Acidithiobacillus Acidithicibacillus thiooxidans and Leptospirillum ferrooxidans were isolated from abandoned mines, waste rocks and tailing dumps. Arsenite oxidizing Thiomonas and Bacillus group of bacteria were isolated and their ability to oxidize As (111) to As (V) established. Mine isolated Sulfate reducing bacteria were used to remove dissolved copper, zinc, iron and arsenic from solutions.
Resumo:
With the development of wearable and mobile computing technology, more and more people start using sleep-tracking tools to collect personal sleep data on a daily basis aiming at understanding and improving their sleep. While sleep quality is influenced by many factors in a person’s lifestyle context, such as exercise, diet and steps walked, existing tools simply visualize sleep data per se on a dashboard rather than analyse those data in combination with contextual factors. Hence many people find it difficult to make sense of their sleep data. In this paper, we present a cloud-based intelligent computing system named SleepExplorer that incorporates sleep domain knowledge and association rule mining for automated analysis on personal sleep data in light of contextual factors. Experiments show that the same contextual factors can play a distinct role in sleep of different people, and SleepExplorer could help users discover factors that are most relevant to their personal sleep.
Resumo:
Data mining involves nontrivial process of extracting knowledge or patterns from large databases. Genetic Algorithms are efficient and robust searching and optimization methods that are used in data mining. In this paper we propose a Self-Adaptive Migration Model GA (SAMGA), where parameters of population size, the number of points of crossover and mutation rate for each population are adaptively fixed. Further, the migration of individuals between populations is decided dynamically. This paper gives a mathematical schema analysis of the method stating and showing that the algorithm exploits previously discovered knowledge for a more focused and concentrated search of heuristically high yielding regions while simultaneously performing a highly explorative search on the other regions of the search space. The effective performance of the algorithm is then shown using standard testbed functions and a set of actual classification datamining problems. Michigan style of classifier was used to build the classifier and the system was tested with machine learning databases of Pima Indian Diabetes database, Wisconsin Breast Cancer database and few others. The performance of our algorithm is better than others.
Resumo:
The Social Water Assessment Protocol (SWAP) is a tool consisting of a series of questions on fourteen themes designed to capture the social context of water around a mine site. A pilot study of the SWAP, conducted in Prestea-Huni Valley, Ghana, showed that some communities were concerned about whether the groundwater was potable. The mining company’s concern was that there was a cycle of dependency amongst communities that received treated water from the mining company. The pilot identified potential data sources and stakeholder groups for each theme, gaps in themes and suggested refinements to questions to improve the SWAP.
Resumo:
Many developing countries are experiencing rapid expansion in mining with associated water impacts. In most cases mining expansion is outpacing the building of national capacity to ensure that sustainable water management practices are implemented. Since 2011, Australia's International Mining for Development Centre (IM4DC) has funded capacity building in such countries including a program of water projects. Five projects in particular (principally covering experiences from Peru, Colombia, Ghana, Zambia, Indonesia, Philippines and Mongolia) have provided insight into water capacity building priorities and opportunities. This paper reviews the challenges faced by water stakeholders, and proposes the associated capacity needs. The paper uses the evidence derived from the IM4DC projects to develop a set of specific capacity-building recommendations. Recommendations include: the incorporation of mine water management in engineering and environmental undergraduate courses; secondments of staff to suitable partner organisations; training to allow site staff to effectively monitor water including community impacts; leadership training to support a water stewardship culture; training of officials to support implementation of catchment management approaches; and the empowerment of communities to recognise and negotiate solutions to mine-related risks. New initiatives to fund the transfer of multi-disciplinary knowledge from nations with well-developed water management practices are called for.
Resumo:
Classification of large datasets is a challenging task in Data Mining. In the current work, we propose a novel method that compresses the data and classifies the test data directly in its compressed form. The work forms a hybrid learning approach integrating the activities of data abstraction, frequent item generation, compression, classification and use of rough sets.
Resumo:
Classification of large datasets is a challenging task in Data Mining. In the current work, we propose a novel method that compresses the data and classifies the test data directly in its compressed form. The work forms a hybrid learning approach integrating the activities of data abstraction, frequent item generation, compression, classification and use of rough sets.
Resumo:
Automatic identification of software faults has enormous practical significance. This requires characterizing program execution behavior and the use of appropriate data mining techniques on the chosen representation. In this paper, we use the sequence of system calls to characterize program execution. The data mining tasks addressed are learning to map system call streams to fault labels and automatic identification of fault causes. Spectrum kernels and SVM are used for the former while latent semantic analysis is used for the latter The techniques are demonstrated for the intrusion dataset containing system call traces. The results show that kernel techniques are as accurate as the best available results but are faster by orders of magnitude. We also show that latent semantic indexing is capable of revealing fault-specific features.