994 resultados para Mining industries
Resumo:
Segmentation is a data mining technique yielding simplified representations of sequences of ordered points. A sequence is divided into some number of homogeneous blocks, and all points within a segment are described by a single value. The focus in this thesis is on piecewise-constant segments, where the most likely description for each segment and the most likely segmentation into some number of blocks can be computed efficiently. Representing sequences as segmentations is useful in, e.g., storage and indexing tasks in sequence databases, and segmentation can be used as a tool in learning about the structure of a given sequence. The discussion in this thesis begins with basic questions related to segmentation analysis, such as choosing the number of segments, and evaluating the obtained segmentations. Standard model selection techniques are shown to perform well for the sequence segmentation task. Segmentation evaluation is proposed with respect to a known segmentation structure. Applying segmentation on certain features of a sequence is shown to yield segmentations that are significantly close to the known underlying structure. Two extensions to the basic segmentation framework are introduced: unimodal segmentation and basis segmentation. The former is concerned with segmentations where the segment descriptions first increase and then decrease, and the latter with the interplay between different dimensions and segments in the sequence. These problems are formally defined and algorithms for solving them are provided and analyzed. Practical applications for segmentation techniques include time series and data stream analysis, text analysis, and biological sequence analysis. In this thesis segmentation applications are demonstrated in analyzing genomic sequences.
Resumo:
Matrix decompositions, where a given matrix is represented as a product of two other matrices, are regularly used in data mining. Most matrix decompositions have their roots in linear algebra, but the needs of data mining are not always those of linear algebra. In data mining one needs to have results that are interpretable -- and what is considered interpretable in data mining can be very different to what is considered interpretable in linear algebra. --- The purpose of this thesis is to study matrix decompositions that directly address the issue of interpretability. An example is a decomposition of binary matrices where the factor matrices are assumed to be binary and the matrix multiplication is Boolean. The restriction to binary factor matrices increases interpretability -- factor matrices are of the same type as the original matrix -- and allows the use of Boolean matrix multiplication, which is often more intuitive than normal matrix multiplication with binary matrices. Also several other decomposition methods are described, and the computational complexity of computing them is studied together with the hardness of approximating the related optimization problems. Based on these studies, algorithms for constructing the decompositions are proposed. Constructing the decompositions turns out to be computationally hard, and the proposed algorithms are mostly based on various heuristics. Nevertheless, the algorithms are shown to be capable of finding good results in empirical experiments conducted with both synthetic and real-world data.
Resumo:
Cell transition data is obtained from a cellular phone that switches its current serving cell tower. The data consists of a sequence of transition events, which are pairs of cell identifiers and transition times. The focus of this thesis is applying data mining methods to such data, developing new algorithms, and extracting knowledge that will be a solid foundation on which to build location-aware applications. In addition to a thorough exploration of the features of the data, the tools and methods developed in this thesis provide solutions to three distinct research problems. First, we develop clustering algorithms that produce a reliable mapping between cell transitions and physical locations observed by users of mobile devices. The main clustering algorithm operates in online fashion, and we consider also a number of offline clustering methods for comparison. Second, we define the concept of significant locations, known as bases, and give an online algorithm for determining them. Finally, we consider the task of predicting the movement of the user, based on historical data. We develop a prediction algorithm that considers paths of movement in their entirety, instead of just the most recent movement history. All of the presented methods are evaluated with a significant body of real cell transition data, collected from about one hundred different individuals. The algorithms developed in this thesis are designed to be implemented on a mobile device, and require no extra hardware sensors or network infrastructure. By not relying on external services and keeping the user information as much as possible on the user s own personal device, we avoid privacy issues and let the users control the disclosure of their location information.
Resumo:
This report provides a systematic review of the most economically damaging endemic diseases and conditions for the Australian red meat industry (cattle, sheep and goats). A number of diseases for cattle, sheep and goats have been identified and were prioritised according to their prevalence, distribution, risk factors and mitigation. The economic cost of each disease as a result of production losses, preventive costs and treatment costs is estimated at the herd and flock level, then extrapolated to a national basis using herd/flock demographics from the 2010-11 Agricultural Census by the Australian Bureau of Statistics. Information shortfalls and recommendations for further research are also specified. A total of 17 cattle, 23 sheep and nine goat diseases were prioritised based on feedback received from producer, government and industry surveys, followed by discussions between the consultants and MLA. Assumptions of disease distribution, in-herd/flock prevalence, impacts on mortality/production and costs for prevention and treatment were obtained from the literature where available. Where these data were not available, the consultants used their own expertise to estimate the relevant measures for each disease. Levels of confidence in the assumptions for each disease were estimated, and gaps in knowledge identified. The assumptions were analysed using a specialised Excel model that estimated the per animal, herd/flock and national costs of each important disease. The report was peer reviewed and workshopped by the consultants and experts selected by MLA before being finalised. Consequently, this report is an important resource that will guide and prioritise future research, development and extension activities by a variety of stakeholders in the red meat industry. This report completes Phase I and Phase II of an overall four-Phase project initiative by MLA, with identified data gaps in this report potentially being addressed within the later phases. Modelling the economic costs using a consistent approach for each disease ensures that the derived estimates are transparent and can be refined if improved data on prevalence becomes available. This means that the report will be an enduring resource for developing policies and strategies for the management of endemic diseases within the Australian red meat industry.
Resumo:
Springsure Creek Coal (SCC) intends to develop a coal mine using the long wall mining process under grain farming land near Emerald in Central Queensland (CQ). While this technology will result in some subsidence of the land surface, SCC wishes to maintain productivity of the grain cropping land in the precinct after coal mining. However, the impact of the surface subsidence resulting from that mining process on productivity of cropping land in any Australian landscape is currently unclear. A research protocol to investigate the impacts of subsidence on grain productivity for when the SCC project becomes operational is proposed. The protocol has wider application for other similar mining projects throughout the country. A copy of the full report is accessible on www.aginstitute.com.au.
Resumo:
Tutkimuksen tavoitteena on tuottaa uutta tietoa Suomen kansantalouden rakenteesta ja lyhyen aikavälin kehityksestä 1920- ja 1930-luvulla. Tutkimus toteutettiin laatimalla kansantaloutta kuvaava panos-tuotostaulu vuodelle 1928 sekä sen laajennus, panos-tuotosmalli. Aineiston avulla kuvataan kansantalouden rakenteellisia riippuvuuksia, tuotannon avaintoimialoja sekä näiden vaikutusta kansantalouteen. Lisäksi tutkimuksessa tarkastellaan kansantalouden tuontiriippuvuutta sekä tuontitullien vaikutusta hintoihin 1930-luvun laman aikana. Tutkimuksen perusteella voitiin identifioida Suomen kansantalouden avaintoimialat vuonna 1928: maatalous, metsätalous, elintarviketeollisuus, puuteollisuus, paperiteollisuus ja rakennustoiminta. Erityisesti elintarviketeollisuuden vahva rooli kansantaloudessa oli kenties yllättävää, erityisesti kun huomioidaan kuinka vähän toimiala on saanut huomiota osakseen taloushistorian tutkimuksessa. Tutkimus osoitti, että Suomen vienti oli pääomavaltaisempaa kuin tuonti. Vaikka tämän tuloksen tulkinta on varauksellinen, tutkimus pystyi osoittamaan ja kvantifioimaan toimialojen työ- ja pääomapanoksen osuuden tuotoksesta yksityiskohtaisesti. Panos-tuotosmallilla arvioitiin puuteollisuuden, paperiteollisuuden ja rakennustoiminnan ajanjaksona 1928-32 tapahtuneen loppukäytön muutoksen vaikutusta kansantalouteen. Merkittävä havainto on, että rakennustoiminnan loppukäytön muutoksella oli erittäin suuri kasvua vähentävä vaikutus koko kansantaloudessa. Talonrakennusinvestointien romahtaminen aiheutti lähes 13 prosentin tuotannon laskun kansantaloudessa. Vaikutus oli jopa suurempi kuin puuteollisuuden viennin romahtamisen. Tulokset osoittavat toisaalta, että yksityisen kulutuksen merkitys kansantaloudelle oli erittäin vahva. Esimerkiksi puuteollisuuden viennin romahtaminen aiheutti yli 4 % tuotannon vähenemisen mutta huomioitaessa mallissa myös yksityisen kulutuksen väheneminen, oli kokonaisvaikutus yli 10 %. Yksityisen kulutuksen huomioiminen mallissa siis yli kaksinkertaisti toimialojen vaikutukset kansantalouteen. Tulokset vahvistivat aiemmissa tutkimuksissa esitettyjä johtopäätöksiä tullipolitiikasta ja osoittivat maatalouteen läheisesti liittyvän elintarviketeollisuuden olleen eniten suojeltu toimiala kansantaloudessa. Muut kotimarkkinoiden toimialat eivät kuitenkaan hyötyneet tullipolitiikasta lamakauden aikana. Panos-tuotoshintamallilla osoitettiin, ettei tullipolitiikka ollut niin onnistunutta kuin aikalaistutkimuksissa väitettiin, vaan tullit korkeintaan pystyivät hidastamaan hintojen alenemista. Tutkimuksen liitteenä esitetään kaikki keskeiset Suomen kansantaloutta vuonna 1928 kuvaavat tilastolliset taulukot, mukaan lukien käyttö- ja tarjontataulukot, panos-tuotostaulukot, panoskertoimet, Leontiefin käänteismatriisi sekä työ- ja pääomapanoskertoimet.
Resumo:
This thesis increased the researchers understanding of the relationship between operations and maintenance in underground longwall coal mines, using data from a Queensland underground coal mine. The thesis explores various relationships between recorded variables. Issues with human recorded data was uncovered, and results emphasised the significance of variables associated with conveyor operation to explain production.
Resumo:
Herbivorous insects comprise a major part of terrestrial biodiversity, and their interactions with their host plants and natural enemies are of vast ecological importance. A large body of research demonstrates that the ecology and evolution of these insects may be affected by trophic interactions, by abiotic influences, and by intraspecific processes, but so far research on these individual aspects has rarely been combined. This thesis uses the leaf-mining moth Tischeria ekebladella and the pedunculate oak (Quercus robur) as a case study to assess how spatial variation in trophic interactions and the physical distribution of host trees jointly affect the distribution, dynamics and evolution of a host-specific herbivore. With respect to habitat quality, Tischeria ekebladella experiences abundant variation at several spatial scales. Most of this variation occurs at small scales notably among leaves and shoots within individual trees. While hypothetically this could cause moths to evolve an ability to select leaves and shoots of high quality, I did not find any coupling between female preference and offspring performance. Based on my studies on temporal variation in resource quality I therefore propose that unpredictable temporal changes in the relative rankings of individual resource units may render it difficult for females to predict the fate of their developing offspring. With respect to intraspecific processes, my results suggest that limited moth dispersal in relation to the spatial distribution of oak trees plays a key role in determining the regional distribution of Tischeria ekebladella. The distribution of the moth is aggregated at the landscape level, where local leaf miner populations are less likely to be present where oaks are scarce. A modelling exercise based on empirical dispersal estimates revealed that the moth population on Wattkast an island in south-western Finland is spatially structured overall, but that the relative importance of local and regional processes on tree-specific moth dynamics varies drastically across the landscape. To conclude, my work in the oak-Tischeria ekebladella system demonstrates that the local abundance and regional distribution of a herbivore may be more strongly influenced by the spatial location of host trees than by their relative quality. Hence, it reveals the importance of considering spatial context in the study of herbivorous insects, and forms a bridge between the classical fields of plant-insect interactions and spatial ecology.
Resumo:
Multi-document summarization addressing the problem of information overload has been widely utilized in the various real-world applications. Most of existing approaches adopt term-based representation for documents which limit the performance of multi-document summarization systems. In this paper, we proposed a novel pattern-based topic model (PBTMSum) for the task of the multi-document summarization. PBTMSum combining pattern mining techniques with LDA topic modelling could generate discriminative and semantic rich representations for topics and documents so that the most representative and non-redundant sentences can be selected to form a succinct and informative summary. Extensive experiments are conducted on the data of document understanding conference (DUC) 2007. The results prove the effectiveness and efficiency of our proposed approach.
Resumo:
This chapter provides a critical legal geography of outer Space, charting the topography of the debates and struggles around its definition, management, and possession. As the emerging field of critical legal geography demonstrates, law is not a neutral organiser of space, but is instead a powerful cultural technology of spatial production. Drawing on legal documents such as the Outer Space Treaty and the Moon Treaty, as well as on the analogous and precedent-setting legal geographies of Antarctica and the deep seabed, the chapter addresses key questions about the legal geography of outer Space, questions which are of growing importance as Space’s available satellite spaces in the geostationary orbit diminish, Space weapons and mining become increasingly viable, Space colonisation and tourism emerge, and questions about Space’s legal status grow in intensity. Who owns outer Space? Who, and whose rules, govern what may or may not (literally) take place there? Is the geostationary orbit the sovereign property of the equatorial states it supertends, as these states argued in the 1970s? Or is it a part of the res communis, or common property of humanity, which currently legally characterises outer Space? Does Space belong to no one, or to everyone? As challenges to the existing legal spatiality of outer Space emerge from spacefaring states, companies, and non-spacefaring states, it is particularly critical that the current spatiality of Space is understood and considered.
Resumo:
Understanding the functioning of a neural system in terms of its underlying circuitry is an important problem in neuroscience. Recent d evelopments in electrophysiology and imaging allow one to simultaneously record activities of hundreds of neurons. Inferring the underlying neuronal connectivity patterns from such multi-neuronal spike train data streams is a challenging statistical and computational problem. This task involves finding significant temporal patterns from vast amounts of symbolic time series data. In this paper we show that the frequent episode mining methods from the field of temporal data mining can be very useful in this context. In the frequent episode discovery framework, the data is viewed as a sequence of events, each of which is characterized by an event type and its time of occurrence and episodes are certain types of temporal patterns in such data. Here we show that, using the set of discovered frequent episodes from multi-neuronal data, one can infer different types of connectivity patterns in the neural system that generated it. For this purpose, we introduce the notion of mining for frequent episodes under certain temporal constraints; the structure of these temporal constraints is motivated by the application. We present algorithms for discovering serial and parallel episodes under these temporal constraints. Through extensive simulation studies we demonstrate that these methods are useful for unearthing patterns of neuronal network connectivity.
Resumo:
The role of Acidithiobacillus group of bacteria in acid generation and heavy metal dissolution was studied with relevance to some Indian mines. Microorganisms implicated in acid generation such as Acidithiobacillus Acidithicibacillus thiooxidans and Leptospirillum ferrooxidans were isolated from abandoned mines, waste rocks and tailing dumps. Arsenite oxidizing Thiomonas and Bacillus group of bacteria were isolated and their ability to oxidize As (111) to As (V) established. Mine isolated Sulfate reducing bacteria were used to remove dissolved copper, zinc, iron and arsenic from solutions.
Resumo:
With the development of wearable and mobile computing technology, more and more people start using sleep-tracking tools to collect personal sleep data on a daily basis aiming at understanding and improving their sleep. While sleep quality is influenced by many factors in a person’s lifestyle context, such as exercise, diet and steps walked, existing tools simply visualize sleep data per se on a dashboard rather than analyse those data in combination with contextual factors. Hence many people find it difficult to make sense of their sleep data. In this paper, we present a cloud-based intelligent computing system named SleepExplorer that incorporates sleep domain knowledge and association rule mining for automated analysis on personal sleep data in light of contextual factors. Experiments show that the same contextual factors can play a distinct role in sleep of different people, and SleepExplorer could help users discover factors that are most relevant to their personal sleep.