4 resultados para Minimum distance
em Helda - Digital Repository of University of Helsinki
Resumo:
Reorganizing a dataset so that its hidden structure can be observed is useful in any data analysis task. For example, detecting a regularity in a dataset helps us to interpret the data, compress the data, and explain the processes behind the data. We study datasets that come in the form of binary matrices (tables with 0s and 1s). Our goal is to develop automatic methods that bring out certain patterns by permuting the rows and columns. We concentrate on the following patterns in binary matrices: consecutive-ones (C1P), simultaneous consecutive-ones (SC1P), nestedness, k-nestedness, and bandedness. These patterns reflect specific types of interplay and variation between the rows and columns, such as continuity and hierarchies. Furthermore, their combinatorial properties are interlinked, which helps us to develop the theory of binary matrices and efficient algorithms. Indeed, we can detect all these patterns in a binary matrix efficiently, that is, in polynomial time in the size of the matrix. Since real-world datasets often contain noise and errors, we rarely witness perfect patterns. Therefore we also need to assess how far an input matrix is from a pattern: we count the number of flips (from 0s to 1s or vice versa) needed to bring out the perfect pattern in the matrix. Unfortunately, for most patterns it is an NP-complete problem to find the minimum distance to a matrix that has the perfect pattern, which means that the existence of a polynomial-time algorithm is unlikely. To find patterns in datasets with noise, we need methods that are noise-tolerant and work in practical time with large datasets. The theory of binary matrices gives rise to robust heuristics that have good performance with synthetic data and discover easily interpretable structures in real-world datasets: dialectical variation in the spoken Finnish language, division of European locations by the hierarchies found in mammal occurrences, and co-occuring groups in network data. In addition to determining the distance from a dataset to a pattern, we need to determine whether the pattern is significant or a mere occurrence of a random chance. To this end, we use significance testing: we deem a dataset significant if it appears exceptional when compared to datasets generated from a certain null hypothesis. After detecting a significant pattern in a dataset, it is up to domain experts to interpret the results in the terms of the application.
Resumo:
In this thesis the role played by expansive and introduced species in the phytoplankton ecology of the Baltic Sea was investigated. The aims were threefold. First, the studies investigated the resting stages of dinoflagellates, which were transported into the Baltic Sea via shipping and were able to germinate under the ambient, nutrient-rich, brackish water conditions. The studies also estimated which factors favoured the occurrence and spread of P. minimum in the Baltic Sea and discussed the identification of this morphologically variable species. In addition, the classification of phytoplankton species recently observed in the Baltic Sea was discussed. Incubation of sediments from four Finnish ports and 10 ships ballast tanks revealed that the sediments act as sources of living dinoflagellates and other phytoplankton. Dinoflagellates germinated from all ports detected and from 90% of ballast tanks. The concentrations of cells germinating from ballast tank sediments were mostly low compared with the acceptable cell concentrations set by the International Maritime Organization s (IMO s) International Convention for the Control and Management of Ships Ballast Water and Sediments. However, the IMO allows such high concentrations of small cells in the discharged ballast water that the total number of cells in large ballast water tanks can be very high. Prorocentrum minimum occurred in the Baltic Sea annually but with no obvious trend in the 10-year timespan from 1993 to 2002. The species occurred under wide ranges of temperatures and salinities and the abundance of the species was positively related especially to the presence of organic nitrogen and phosphorus. This indicated that the species was favoured by increased organic nutrient loading and runoff from land and rivers. The cell shape of P. minimum varied from triangular to oval-round, but morphological fine details indicated that only one morphospecies was present. P. minimum also is, according to present knowledge, the only potentially harmful phytoplankton species that has recently expanded widely into new areas of the Baltic Sea.