34 resultados para Imbalanced datasets
Resumo:
There are more than 7000 languages in the world, and many of these have emerged through linguistic divergence. While questions related to the drivers of linguistic diversity have been studied before, including studies with quantitative methods, there is no consensus as to which factors drive linguistic divergence, and how. In the thesis, I have studied linguistic divergence with a multidisciplinary approach, applying the framework and quantitative methods of evolutionary biology to language data. With quantitative methods, large datasets may be analyzed objectively, while approaches from evolutionary biology make it possible to revisit old questions (related to, for example, the shape of the phylogeny) with new methods, and adopt novel perspectives to pose novel questions. My chief focus was on the effects exerted on the speakers of a language by environmental and cultural factors. My approach was thus an ecological one, in the sense that I was interested in how the local environment affects humans and whether this human-environment connection plays a possible role in the divergence process. I studied this question in relation to the Uralic language family and to the dialects of Finnish, thus covering two different levels of divergence. However, as the Uralic languages have not previously been studied using quantitative phylogenetic methods, nor have population genetic methods been previously applied to any dialect data, I first evaluated the applicability of these biological methods to language data. I found the biological methodology to be applicable to language data, as my results were rather similar to traditional views as to both the shape of the Uralic phylogeny and the division of Finnish dialects. I also found environmental conditions, or changes in them, to be plausible inducers of linguistic divergence: whether in the first steps in the divergence process, i.e. dialect divergence, or on a large scale with the entire language family. My findings concerning Finnish dialects led me to conclude that the functional connection between linguistic divergence and environmental conditions may arise through human cultural adaptation to varying environmental conditions. This is also one possible explanation on the scale of the Uralic language family as a whole. The results of the thesis bring insights on several different issues in both a local and a global context. First, they shed light on the emergence of the Finnish dialects. If the approach used in the thesis is applied to the dialects of other languages, broader generalizations may be drawn as to the inducers of linguistic divergence. This again brings us closer to understanding the global patterns of linguistic diversity. Secondly, the quantitative phylogeny of the Uralic languages, with estimated times of language divergences, yields another hypothesis as to the shape and age of the language family tree. In addition, the Uralic languages can now be added to the growing list of language families studied with quantitative methods. This will allow broader inferences as to global patterns of language evolution, and more language families can be included in constructing the tree of the world’s languages. Studying history through language, however, is only one way to illuminate the human past. Therefore, thirdly, the findings of the thesis, when combined with studies of other language families, and those for example in genetics and archaeology, bring us again closer to an understanding of human history.
Resumo:
Optical microscopy is living its renaissance. The diffraction limit, although still physically true, plays a minor role in the achievable resolution in far-field fluorescence microscopy. Super-resolution techniques enable fluorescence microscopy at nearly molecular resolution. Modern (super-resolution) microscopy methods rely strongly on software. Software tools are needed all the way from data acquisition, data storage, image reconstruction, restoration and alignment, to quantitative image analysis and image visualization. These tools play a key role in all aspects of microscopy today – and their importance in the coming years is certainly going to increase, when microscopy little-by-little transitions from single cells into more complex and even living model systems. In this thesis, a series of bioimage informatics software tools are introduced for STED super-resolution microscopy. Tomographic reconstruction software, coupled with a novel image acquisition method STED< is shown to enable axial (3D) super-resolution imaging in a standard 2D-STED microscope. Software tools are introduced for STED super-resolution correlative imaging with transmission electron microscopes or atomic force microscopes. A novel method for automatically ranking image quality within microscope image datasets is introduced, and it is utilized to for example select the best images in a STED microscope image dataset.
Resumo:
The increasing performance of computers has made it possible to solve algorithmically problems for which manual and possibly inaccurate methods have been previously used. Nevertheless, one must still pay attention to the performance of an algorithm if huge datasets are used or if the problem iscomputationally difficult. Two geographic problems are studied in the articles included in this thesis. In the first problem the goal is to determine distances from points, called study points, to shorelines in predefined directions. Together with other in-formation, mainly related to wind, these distances can be used to estimate wave exposure at different areas. In the second problem the input consists of a set of sites where water quality observations have been made and of the results of the measurements at the different sites. The goal is to select a subset of the observational sites in such a manner that water quality is still measured in a sufficient accuracy when monitoring at the other sites is stopped to reduce economic cost. Most of the thesis concentrates on the first problem, known as the fetch length problem. The main challenge is that the two-dimensional map is represented as a set of polygons with millions of vertices in total and the distances may also be computed for millions of study points in several directions. Efficient algorithms are developed for the problem, one of them approximate and the others exact except for rounding errors. The solutions also differ in that three of them are targeted for serial operation or for a small number of CPU cores whereas one, together with its further developments, is suitable also for parallel machines such as GPUs.
Resumo:
Wind power is a rapidly developing, low-emission form of energy production. In Fin-land, the official objective is to increase wind power capacity from the current 1 005 MW up to 3 500–4 000 MW by 2025. By the end of April 2015, the total capacity of all wind power project being planned in Finland had surpassed 11 000 MW. As the amount of projects in Finland is record high, an increasing amount of infrastructure is also being planned and constructed. Traditionally, these planning operations are conducted using manual and labor-intensive work methods that are prone to subjectivity. This study introduces a GIS-based methodology for determining optimal paths to sup-port the planning of onshore wind park infrastructure alignment in Nordanå-Lövböle wind park located on the island of Kemiönsaari in Southwest Finland. The presented methodology utilizes a least-cost path (LCP) algorithm for searching of optimal paths within a high resolution real-world terrain dataset derived from airborne lidar scannings. In addition, planning data is used to provide a realistic planning framework for the anal-ysis. In order to produce realistic results, the physiographic and planning datasets are standardized and weighted according to qualitative suitability assessments by utilizing methods and practices offered by multi-criteria evaluation (MCE). The results are pre-sented as scenarios to correspond various different planning objectives. Finally, the methodology is documented by using tools of Business Process Management (BPM). The results show that the presented methodology can be effectively used to search and identify extensive, 20 to 35 kilometers long networks of paths that correspond to certain optimization objectives in the study area. The utilization of high-resolution terrain data produces a more objective and more detailed path alignment plan. This study demon-strates that the presented methodology can be practically applied to support a wind power infrastructure alignment planning process. The six-phase structure of the method-ology allows straightforward incorporation of different optimization objectives. The methodology responds well to combining quantitative and qualitative data. Additional-ly, the careful documentation presents an example of how the methodology can be eval-uated and developed as a business process. This thesis also shows that more emphasis on the research of algorithm-based, more objective methods for the planning of infrastruc-ture alignment is desirable, as technological development has only recently started to realize the potential of these computational methods.