11 resultados para deviance mining

em Helda - Digital Repository of University of Helsinki


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Precipitation-induced runoff and leaching from milled peat mining mires by peat types: a comparative method for estimating the loading of water bodies during peat production. This research project in environmental geology has arisen out of an observed need to be able to predict more accurately the loading of watercourses with detrimental organic substances and nutrients from already existing and planned peat production areas, since the authorities capacity for insisting on such predictions covering the whole duration of peat production in connection with evaluations of environmental impact is at present highly limited. National and international decisions regarding monitoring of the condition of watercourses and their improvement and restoration require more sophisticated evaluation methods in order to be able to forecast watercourse loading and its environmental impacts at the stage of land-use planning and preparations for peat production.The present project thus set out from the premise that it would be possible on the basis of existing mire and peat data properties to construct estimates for the typical loading from production mires over the whole duration of their exploitation. Finland has some 10 million hectares of peatland, accounting for almost a third of its total area. Macroclimatic conditions have varied in the course of the Holocene growth and development of this peatland, and with them the habitats of the peat-forming plants. Temperatures and moisture conditions have played a significant role in determining the dominant species of mire plants growing there at any particular time, the resulting mire types and the accumulation and deposition of plant remains to form the peat. The above climatic, environmental and mire development factors, together with ditching, have contributed, and continue to contribute, to the existence of peat horizons that differ in their physical and chemical properties, leading to differences in material transport between peatlands in a natural state and mires that have been ditched or prepared for forestry and peat production. Watercourse loading from the ditching of mires or their use for peat production can have detrimental effects on river and lake environments and their recreational use, especially where oxygen-consuming organic solids and soluble organic substances and nutrients are concerned. It has not previously been possible, however, to estimate in advance the watercourse loading likely to arise from ditching and peat production on the basis of the characteristics of the peat in a mire, although earlier observations have indicated that watercourse loading from peat production can vary greatly and it has been suggested that differences in peat properties may be of significance in this. Sprinkling is used here in combination with simulations of conditions in a milled peat production area to determine the influence of the physical and chemical properties of milled peats in production mires on surface runoff into the drainage ditches and the concentrations of material in the runoff water. Sprinkling and extraction experiments were carried out on 25 samples of milled Carex (C) and Sphagnum (S) peat of humification grades H 2.5 8.5 with moisture content in the range 23.4 89% on commencement of the first sprinkling, which was followed by a second sprinkling 24 hours later. The water retention capacity of the peat was best, and surface runoff lowest, with Sphagnum and Carex peat samples of humification grades H 2.5 6 in the moisture content class 56 75%. On account of the hydrophobicity of dry peat, runoff increased in a fairly regular manner with drying of the sample from 55% to 24 30%. Runoff from the samples with an original moisture content over 55% increased by 63% in the second round of sprinkling relative to the first, as they had practically reached saturation point on the first occasion, while those with an original moisture content below 55% retained their high runoff in the second round, due to continued hydrophobicity. The well-humified samples (H 6.5 8.5) with a moisture content over 80% showed a low water retention capacity and high runoff in both rounds of sprinkling. Loading of the runoff water with suspended solids, total phosphorus and total nitrogen, and also the chemical oxygen demand (CODMn O2), varied greatly in the sprinkling experiment, depending on the peat type and degree of humification, but concentrations of the same substances in the two sprinklings were closely or moderately closely correlated and these correlations were significant. The concentrations of suspended solids in the runoff water observed in the simulations of a peat production area and the direct surface runoff from it into the drainage ditch system in response to rain (sprinkling intensity 1.27 mm/min) varied c. 60-fold between the degrees of humification in the case of the Carex peats and c. 150-fold for the Sphagnum peats, while chemical oxygen demand varied c. 30-fold and c. 50-fold, respectively, total phosphorus c. 60-fold and c. 66-fold, total nitrogen c. 65-fold and c. 195-fold and ammonium nitrogen c. 90-fold and c. 30-fold. The increases in concentrations in the runoff water were very closely correlated with increases in humification of the peat. The correlations of the concentrations measured in extraction experiments (48 h) with peat type and degree of humification corresponded to those observed in the sprinkler experiments. The resulting figures for the surface runoff from a peat production area into the drainage ditches simulated by means of sprinkling and material concentrations in the runoff water were combined with statistics on the mean extent of daily rainfall (0 67 mm) during the frost-free period of the year (May October) over an observation period of 30 years to yield typical annual loading figures (kg/ha) for suspended solids (SS), chemical oxygen demand of organic matter (CODmn O2), total phosphorus (tot. P) and total nitrogen (tot. N) entering the ditches with respect to milled Carex (C) and Sphagnum (S) peats of humification grades H 2.5 8.5. In order to calculate the loading of drainage ditches from a milled peat production mire with the aid of these annual comparative values (in kg/ha), information is required on the properties of the intended production mire and its peat. Once data are available on the area of the mire, its peat depth, peat types and their degrees of humification, dry matter content, calorific value and corresponding energy content, it is possible to produce mutually comparable estimates for individual mires with respect to the annual loading of the drainage ditch system and the surrounding watercourse for the whole service life of the production area, the duration of this service life, determinations of energy content and the amount of loading per unit of energy generated (kg/MWh). In the 8 mires in the Köyhäjoki basin, Central Ostrobothnia, taken as an example, the loading of suspended solids (SS) in the drainage ditch networks calculated on the basis of the typical values obtained here and existing mire and peat data and expressed per unit of energy generated varied between the mires and horizons in the range 0.9 16.5 kg/MWh. One of the aims of this work was to develop means of making better use of existing mire and peat data and the results of corings and other field investigations. In this respect combination of the typical loading values (kg/ha) obtained here for S, SC, CS and C peats and the various degrees of humification (H 2.5 8.5) with the above mire and peat data by means of a computer program for the acquisition and handling of such data would enable all the information currently available and that deposited in the system in the future to be used for defining watercourse loading estimates for mires and comparing them with the corresponding estimates of energy content. The intention behind this work has been to respond to the challenge facing the energy generation industry to find larger peat production areas that exert less loading on the environment and to that facing the environmental authorities to improve the means available for estimating watercourse loading from peat production and its environmental impacts in advance. The results conform well to the initial hypothesis and to the goals laid down for the research and should enable watercourse loading from existing and planned peat production to be evaluated better in the future and the resulting impacts to be taken into account when planning land use and energy generation. The advance loading information available in this way would be of value in the selection of individual peat production areas, the planning of their exploitation, the introduction of water protection measures and the planning of loading inspections, in order to achieve controlled peat production that pays due attention to environmental considerations.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Telecommunications network management is based on huge amounts of data that are continuously collected from elements and devices from all around the network. The data is monitored and analysed to provide information for decision making in all operation functions. Knowledge discovery and data mining methods can support fast-pace decision making in network operations. In this thesis, I analyse decision making on different levels of network operations. I identify the requirements decision-making sets for knowledge discovery and data mining tools and methods, and I study resources that are available to them. I then propose two methods for augmenting and applying frequent sets to support everyday decision making. The proposed methods are Comprehensive Log Compression for log data summarisation and Queryable Log Compression for semantic compression of log data. Finally I suggest a model for a continuous knowledge discovery process and outline how it can be implemented and integrated to the existing network operations infrastructure.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Segmentation is a data mining technique yielding simplified representations of sequences of ordered points. A sequence is divided into some number of homogeneous blocks, and all points within a segment are described by a single value. The focus in this thesis is on piecewise-constant segments, where the most likely description for each segment and the most likely segmentation into some number of blocks can be computed efficiently. Representing sequences as segmentations is useful in, e.g., storage and indexing tasks in sequence databases, and segmentation can be used as a tool in learning about the structure of a given sequence. The discussion in this thesis begins with basic questions related to segmentation analysis, such as choosing the number of segments, and evaluating the obtained segmentations. Standard model selection techniques are shown to perform well for the sequence segmentation task. Segmentation evaluation is proposed with respect to a known segmentation structure. Applying segmentation on certain features of a sequence is shown to yield segmentations that are significantly close to the known underlying structure. Two extensions to the basic segmentation framework are introduced: unimodal segmentation and basis segmentation. The former is concerned with segmentations where the segment descriptions first increase and then decrease, and the latter with the interplay between different dimensions and segments in the sequence. These problems are formally defined and algorithms for solving them are provided and analyzed. Practical applications for segmentation techniques include time series and data stream analysis, text analysis, and biological sequence analysis. In this thesis segmentation applications are demonstrated in analyzing genomic sequences.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Matrix decompositions, where a given matrix is represented as a product of two other matrices, are regularly used in data mining. Most matrix decompositions have their roots in linear algebra, but the needs of data mining are not always those of linear algebra. In data mining one needs to have results that are interpretable -- and what is considered interpretable in data mining can be very different to what is considered interpretable in linear algebra. --- The purpose of this thesis is to study matrix decompositions that directly address the issue of interpretability. An example is a decomposition of binary matrices where the factor matrices are assumed to be binary and the matrix multiplication is Boolean. The restriction to binary factor matrices increases interpretability -- factor matrices are of the same type as the original matrix -- and allows the use of Boolean matrix multiplication, which is often more intuitive than normal matrix multiplication with binary matrices. Also several other decomposition methods are described, and the computational complexity of computing them is studied together with the hardness of approximating the related optimization problems. Based on these studies, algorithms for constructing the decompositions are proposed. Constructing the decompositions turns out to be computationally hard, and the proposed algorithms are mostly based on various heuristics. Nevertheless, the algorithms are shown to be capable of finding good results in empirical experiments conducted with both synthetic and real-world data.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Cell transition data is obtained from a cellular phone that switches its current serving cell tower. The data consists of a sequence of transition events, which are pairs of cell identifiers and transition times. The focus of this thesis is applying data mining methods to such data, developing new algorithms, and extracting knowledge that will be a solid foundation on which to build location-aware applications. In addition to a thorough exploration of the features of the data, the tools and methods developed in this thesis provide solutions to three distinct research problems. First, we develop clustering algorithms that produce a reliable mapping between cell transitions and physical locations observed by users of mobile devices. The main clustering algorithm operates in online fashion, and we consider also a number of offline clustering methods for comparison. Second, we define the concept of significant locations, known as bases, and give an online algorithm for determining them. Finally, we consider the task of predicting the movement of the user, based on historical data. We develop a prediction algorithm that considers paths of movement in their entirety, instead of just the most recent movement history. All of the presented methods are evaluated with a significant body of real cell transition data, collected from about one hundred different individuals. The algorithms developed in this thesis are designed to be implemented on a mobile device, and require no extra hardware sensors or network infrastructure. By not relying on external services and keeping the user information as much as possible on the user s own personal device, we avoid privacy issues and let the users control the disclosure of their location information.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Herbivorous insects comprise a major part of terrestrial biodiversity, and their interactions with their host plants and natural enemies are of vast ecological importance. A large body of research demonstrates that the ecology and evolution of these insects may be affected by trophic interactions, by abiotic influences, and by intraspecific processes, but so far research on these individual aspects has rarely been combined. This thesis uses the leaf-mining moth Tischeria ekebladella and the pedunculate oak (Quercus robur) as a case study to assess how spatial variation in trophic interactions and the physical distribution of host trees jointly affect the distribution, dynamics and evolution of a host-specific herbivore. With respect to habitat quality, Tischeria ekebladella experiences abundant variation at several spatial scales. Most of this variation occurs at small scales notably among leaves and shoots within individual trees. While hypothetically this could cause moths to evolve an ability to select leaves and shoots of high quality, I did not find any coupling between female preference and offspring performance. Based on my studies on temporal variation in resource quality I therefore propose that unpredictable temporal changes in the relative rankings of individual resource units may render it difficult for females to predict the fate of their developing offspring. With respect to intraspecific processes, my results suggest that limited moth dispersal in relation to the spatial distribution of oak trees plays a key role in determining the regional distribution of Tischeria ekebladella. The distribution of the moth is aggregated at the landscape level, where local leaf miner populations are less likely to be present where oaks are scarce. A modelling exercise based on empirical dispersal estimates revealed that the moth population on Wattkast an island in south-western Finland is spatially structured overall, but that the relative importance of local and regional processes on tree-specific moth dynamics varies drastically across the landscape. To conclude, my work in the oak-Tischeria ekebladella system demonstrates that the local abundance and regional distribution of a herbivore may be more strongly influenced by the spatial location of host trees than by their relative quality. Hence, it reveals the importance of considering spatial context in the study of herbivorous insects, and forms a bridge between the classical fields of plant-insect interactions and spatial ecology.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Gene mapping is a systematic search for genes that affect observable characteristics of an organism. In this thesis we offer computational tools to improve the efficiency of (disease) gene-mapping efforts. In the first part of the thesis we propose an efficient simulation procedure for generating realistic genetical data from isolated populations. Simulated data is useful for evaluating hypothesised gene-mapping study designs and computational analysis tools. As an example of such evaluation, we demonstrate how a population-based study design can be a powerful alternative to traditional family-based designs in association-based gene-mapping projects. In the second part of the thesis we consider a prioritisation of a (typically large) set of putative disease-associated genes acquired from an initial gene-mapping analysis. Prioritisation is necessary to be able to focus on the most promising candidates. We show how to harness the current biomedical knowledge for the prioritisation task by integrating various publicly available biological databases into a weighted biological graph. We then demonstrate how to find and evaluate connections between entities, such as genes and diseases, from this unified schema by graph mining techniques. Finally, in the last part of the thesis, we define the concept of reliable subgraph and the corresponding subgraph extraction problem. Reliable subgraphs concisely describe strong and independent connections between two given vertices in a random graph, and hence they are especially useful for visualising such connections. We propose novel algorithms for extracting reliable subgraphs from large random graphs. The efficiency and scalability of the proposed graph mining methods are backed by extensive experiments on real data. While our application focus is in genetics, the concepts and algorithms can be applied to other domains as well. We demonstrate this generality by considering coauthor graphs in addition to biological graphs in the experiments.