851 resultados para statistical methods
Resumo:
This thesis studies human gene expression space using high throughput gene expression data from DNA microarrays. In molecular biology, high throughput techniques allow numerical measurements of expression of tens of thousands of genes simultaneously. In a single study, this data is traditionally obtained from a limited number of sample types with a small number of replicates. For organism-wide analysis, this data has been largely unavailable and the global structure of human transcriptome has remained unknown. This thesis introduces a human transcriptome map of different biological entities and analysis of its general structure. The map is constructed from gene expression data from the two largest public microarray data repositories, GEO and ArrayExpress. The creation of this map contributed to the development of ArrayExpress by identifying and retrofitting the previously unusable and missing data and by improving the access to its data. It also contributed to creation of several new tools for microarray data manipulation and establishment of data exchange between GEO and ArrayExpress. The data integration for the global map required creation of a new large ontology of human cell types, disease states, organism parts and cell lines. The ontology was used in a new text mining and decision tree based method for automatic conversion of human readable free text microarray data annotations into categorised format. The data comparability and minimisation of the systematic measurement errors that are characteristic to each lab- oratory in this large cross-laboratories integrated dataset, was ensured by computation of a range of microarray data quality metrics and exclusion of incomparable data. The structure of a global map of human gene expression was then explored by principal component analysis and hierarchical clustering using heuristics and help from another purpose built sample ontology. A preface and motivation to the construction and analysis of a global map of human gene expression is given by analysis of two microarray datasets of human malignant melanoma. The analysis of these sets incorporate indirect comparison of statistical methods for finding differentially expressed genes and point to the need to study gene expression on a global level.
Resumo:
The paradigm of computational vision hypothesizes that any visual function -- such as the recognition of your grandparent -- can be replicated by computational processing of the visual input. What are these computations that the brain performs? What should or could they be? Working on the latter question, this dissertation takes the statistical approach, where the suitable computations are attempted to be learned from the natural visual data itself. In particular, we empirically study the computational processing that emerges from the statistical properties of the visual world and the constraints and objectives specified for the learning process. This thesis consists of an introduction and 7 peer-reviewed publications, where the purpose of the introduction is to illustrate the area of study to a reader who is not familiar with computational vision research. In the scope of the introduction, we will briefly overview the primary challenges to visual processing, as well as recall some of the current opinions on visual processing in the early visual systems of animals. Next, we describe the methodology we have used in our research, and discuss the presented results. We have included some additional remarks, speculations and conclusions to this discussion that were not featured in the original publications. We present the following results in the publications of this thesis. First, we empirically demonstrate that luminance and contrast are strongly dependent in natural images, contradicting previous theories suggesting that luminance and contrast were processed separately in natural systems due to their independence in the visual data. Second, we show that simple cell -like receptive fields of the primary visual cortex can be learned in the nonlinear contrast domain by maximization of independence. Further, we provide first-time reports of the emergence of conjunctive (corner-detecting) and subtractive (opponent orientation) processing due to nonlinear projection pursuit with simple objective functions related to sparseness and response energy optimization. Then, we show that attempting to extract independent components of nonlinear histogram statistics of a biologically plausible representation leads to projection directions that appear to differentiate between visual contexts. Such processing might be applicable for priming, \ie the selection and tuning of later visual processing. We continue by showing that a different kind of thresholded low-frequency priming can be learned and used to make object detection faster with little loss in accuracy. Finally, we show that in a computational object detection setting, nonlinearly gain-controlled visual features of medium complexity can be acquired sequentially as images are encountered and discarded. We present two online algorithms to perform this feature selection, and propose the idea that for artificial systems, some processing mechanisms could be selectable from the environment without optimizing the mechanisms themselves. In summary, this thesis explores learning visual processing on several levels. The learning can be understood as interplay of input data, model structures, learning objectives, and estimation algorithms. The presented work adds to the growing body of evidence showing that statistical methods can be used to acquire intuitively meaningful visual processing mechanisms. The work also presents some predictions and ideas regarding biological visual processing.
Resumo:
The development of fishery indicators is a crucial undertaking as it ultimately provides evidence to stakeholders about the status of fished species such as population size and survival rates. In Queensland, as in many other parts of the world, age-abundance indicators (e.g. fish catch rate and/or age composition data) are traditionally used as the evidence basis because they provide information on species life history traits as well as on changes in fishing pressures and population sizes. Often, however, the accuracy of the information from age-abundance indicators can be limited due to missing or biased data. Consequently, improved statistical methods are required to enhance the accuracy, precision and decision-support value of age-abundance indicators.
Resumo:
Lateral displacement and global stability are the two main stability criteria for soil nail walls. Conventional design methods do not adequately address the deformation behaviour of soil nail walls, owing to the complexity involved in handling a large number of influencing factors. Consequently, limited methods of deformation estimates based on empirical relationships and in situ performance monitoring are available in the literature. It is therefore desirable that numerical techniques and statistical methods are used in order to gain a better insight into the deformation behaviour of soil nail walls. In the present study numerical experiments are conducted using a 2 4 factorial design method. Based on analysis of the maximum lateral deformation and factor-of-safety observations from the numerical experiments, regression models for maximum lateral deformation and factor-of-safety prediction are developed and checked for adequacy. Selection of suitable design factors for the 2 4 factorial design of numerical experiments enabled the use of the proposed regression models over a practical range of soil nail wall heights and in situ soil variability. It is evident from the model adequacy analyses and illustrative example that the proposed regression models provided a reasonably good estimate of the lateral deformation and global factor of safety of the soil nail walls.
Resumo:
The Baltic Sea is a geologically young, large brackish water basin, and few of the species living there have fully adapted to its special conditions. Many of the species live on the edge of their distribution range in terms of one or more environmental variables such as salinity or temperature. Environmental fluctuations are know to cause fluctuations in populations abundance, and this effect is especially strong near the edges of the distribution range, where even small changes in an environmental variable can be critical to the success of a species. This thesis examines which environmental factors are the most important in relation to the success of various commercially exploited fish species in the northern Baltic Sea. It also examines the uncertainties related to fish stocks current and potential status as well as to their relationship with their environment. The aim is to quantify the uncertainties related to fisheries and environmental management, to find potential management strategies that can be used to reduce uncertainty in management results and to develop methodology related to uncertainty estimation in natural resources management. Bayesian statistical methods are utilized due to their ability to treat uncertainty explicitly in all parts of the statistical model. The results show that uncertainty about important parameters of even the most intensively studied fish species such as salmon (Salmo salar L.) and Baltic herring (Clupea harengus membras L.) is large. On the other hand, management approaches that reduce uncertainty can be found. These include utilising information about ecological similarity of fish stocks and species, and using management variables that are directly related to stock parameters that can be measured easily and without extrapolations or assumptions.
Resumo:
Lead contamination in the environment is of particular concern, as it is a known toxin. Until recently, however, much less attention has been given to the local contamination caused by activities at shooting ranges compared to large-scale industrial contamination. In Finland, more than 500 tons of Pb is produced each year for shotgun ammunition. The contaminant threatens various organisms, ground water and the health of human populations. However, the forest at shooting ranges usually shows no visible sign of stress compared to nearby clean environments. The aboveground biota normally reflects the belowground ecosystem. Thus, the soil microbial communities appear to bear strong resistance to contamination, despite the influence of lead. The studies forming this thesis investigated a shooting range site at Hälvälä in Southern Finland, which is heavily contaminated by lead pellets. Previously it was experimentally shown that the growth of grasses and degradation of litter are retarded. Measurements of acute toxicity of the contaminated soil or soil extracts gave conflicting results, as enchytraeid worms used as toxicity reporters were strongly affected, while reporter bacteria showed no or very minor decreases in viability. Measurements using sensitive inducible luminescent reporter bacteria suggested that the bioavailability of lead in the soil is indeed low, and this notion was supported by the very low water extractability of the lead. Nevertheless, the frequency of lead-resistant cultivable bacteria was elevated based on the isolation of cultivable strains. The bacterial and fungal diversity in heavily lead contaminated shooting sectors were compared with those of pristine sections of the shooting range area. The bacterial 16S rRNA gene and fungal ITS rRNA gene were amplified, cloned and sequenced using total DNA extracted from the soil humus layer as the template. Altogether, 917 sequenced bacterial clones and 649 sequenced fungal clones revealed a high soil microbial diversity. No effect of lead contamination was found on bacterial richness or diversity, while fungal richness and diversity significantly differed between lead contaminated and clean control areas. However, even in the case of fungi, genera that were deemed sensitive were not totally absent from the contaminated area: only their relative frequency was significantly reduced. Some operational taxonomic units (OTUs) assigned to Basidiomycota were clearly affected, and were much rarer in the lead contaminated areas. The studies of this thesis surveyed EcM sporocarps, analyzed morphotyped EcM root tips by direct sequencing, and 454-pyrosequenced fungal communities in in-growth bags. A total of 32 EcM fungi that formed conspicuous sporocarps, 27 EcM fungal OTUs from 294 root tips, and 116 EcM fungal OTUs from a total of 8 194 ITS2 454 sequences were recorded. The ordination analyses by non-parametric multidimensional scaling (NMS) indicated that Pb enrichment induced a shift in the EcM community composition. This was visible as indicative trends in the sporocarp and root tip datasets, but explicitly clear in the communities observed in the in-growth bags. The compositional shift in the EcM community was mainly attributable to an increase in the frequencies of OTUs assigned to the genus Thelephora, and to a decrease in the OTUs assigned to Pseudotomentella, Suillus and Tylospora in Pb-contaminated areas when compared to the control. The enrichment of Thelephora in contaminated areas was also observed when examining the total fungal communities in soil using DNA cloning and sequencing technology. While the compositional shifts are clear, their functional consequences for the dominant trees or soil ecosystem remain undetermined. The results indicate that at the Hälvälä shooting range, lead influences the fungal communities but not the bacterial communities. The forest ecosystem shows apparent functional redundancy, since no significant effects were seen on forest trees. Recently, by means of 454 pyrosequencing , the amount of sequences in a single analysis run can be up to one million. It has been applied in microbial ecology studies to characterize microbial communities. The handling of sequence data with traditional programs is becoming difficult and exceedingly time consuming, and novel tools are needed to handle the vast amounts of data being generated. The field of microbial ecology has recently benefited from the availability of a number of tools for describing and comparing microbial communities using robust statistical methods. However, although these programs provide methods for rapid calculation, it has become necessary to make them more amenable to larger datasets and numbers of samples from pyrosequencing. As part of this thesis, a new program was developed, MuSSA (Multi-Sample Sequence Analyser), to handle sequence data from novel high-throughput sequencing approaches in microbial community analyses. The greatest advantage of the program is that large volumes of sequence data can be manipulated, and general OTU series with a frequency value can be calculated among a large number of samples.
Resumo:
This work focuses on the role of macroseismology in the assessment of seismicity and probabilistic seismic hazard in Northern Europe. The main type of data under consideration is a set of macroseismic observations available for a given earthquake. The macroseismic questionnaires used to collect earthquake observations from local residents since the late 1800s constitute a special part of the seismological heritage in the region. Information of the earthquakes felt on the coasts of the Gulf of Bothnia between 31 March and 2 April 1883 and on 28 July 1888 was retrieved from the contemporary Finnish and Swedish newspapers, while the earthquake of 4 November 1898 GMT is an example of an early systematic macroseismic survey in the region. A data set of more than 1200 macroseismic questionnaires is available for the earthquake in Central Finland on 16 November 1931. Basic macroseismic investigations including preparation of new intensity data point (IDP) maps were conducted for these earthquakes. Previously disregarded usable observations were found in the press. The improved collection of IDPs of the 1888 earthquake shows that this event was a rare occurrence in the area. In contrast to earlier notions it was felt on both sides of the Gulf of Bothnia. The data on the earthquake of 4 November 1898 GMT were augmented with historical background information discovered in various archives and libraries. This earthquake was of some concern to the authorities, because extra fire inspections were conducted in three towns at least, i.e. Tornio, Haparanda and Piteå, located in the centre of the area of perceptibility. This event posed the indirect hazard of fire, although its magnitude around 4.6 was minor on the global scale. The distribution of slightly damaging intensities was larger than previously outlined. This may have resulted from the amplification of the ground shaking in the soft soil of the coast and river valleys where most of the population was found. The large data set of the 1931 earthquake provided an opportunity to apply statistical methods and assess methodologies that can be used when dealing with macroseismic intensity. It was evaluated using correspondence analysis. Different approaches such as gridding were tested to estimate the macroseismic field from the intensity values distributed irregularly in space. In general, the characteristics of intensity warrant careful consideration. A more pervasive perception of intensity as an ordinal quantity affected by uncertainties is advocated. A parametric earthquake catalogue comprising entries from both the macroseismic and instrumental era was used for probabilistic seismic hazard assessment. The parametric-historic methodology was applied to estimate seismic hazard at a given site in Finland and to prepare a seismic hazard map for Northern Europe. The interpretation of these results is an important issue, because the recurrence times of damaging earthquakes may well exceed thousands of years in an intraplate setting such as Northern Europe. This application may therefore be seen as an example of short-term hazard assessment.
Resumo:
Changes in alcohol pricing have been documented as inversely associated with changes in consumption and alcohol-related problems. Evidence of the association between price changes and health problems is nevertheless patchy and is based to a large extent on cross-sectional state-level data, or time series of such cross-sectional analyses. Natural experimental studies have been called for. There was a substantial reduction in the price of alcohol in Finland in 2004 due to a reduction in alcohol taxes of one third, on average, and the abolition of duty-free allowances for travellers from the EU. These changes in the Finnish alcohol policy could be considered a natural experiment, which offered a good opportunity to study what happens with regard to alcohol-related problems when prices go down. The present study investigated the effects of this reduction in alcohol prices on (1) alcohol-related and all-cause mortality, and mortality due to cardiovascular diseases, (2) alcohol-related morbidity in terms of hospitalisation, (3) socioeconomic differentials in alcohol-related mortality, and (4) small-area differences in interpersonal violence in the Helsinki Metropolitan area. Differential trends in alcohol-related mortality prior to the price reduction were also analysed. A variety of population-based register data was used in the study. Time-series intervention analysis modelling was applied to monthly aggregations of deaths and hospitalisation for the period 1996-2006. These and other mortality analyses were carried out for men and women aged 15 years and over. Socioeconomic differentials in alcohol-related mortality were assessed on a before/after basis, mortality being followed up in 2001-2003 (before the price reduction) and 2004-2005 (after). Alcohol-related mortality was defined in all the studies on mortality on the basis of information on both underlying and contributory causes of death. Hospitalisation related to alcohol meant that there was a reference to alcohol in the primary diagnosis. Data on interpersonal violence was gathered from 86 administrative small-areas in the Helsinki Metropolitan area and was also assessed on a before/after basis followed up in 2002-2003 and 2004-2005. The statistical methods employed to analyse these data sets included time-series analysis, and Poisson and linear regression. The results of the study indicate that alcohol-related deaths increased substantially among men aged 40-69 years and among women aged 50-69 after the price reduction when trends and seasonal variation were taken into account. The increase was mainly attributable to chronic causes, particularly liver diseases. Mortality due to cardiovascular diseases and all-cause mortality, on the other hand, decreased considerably among the-over-69-year-olds. The increase in alcohol-related mortality in absolute terms among the 30-59-year-olds was largest among the unemployed and early-age pensioners, and those with a low level of education, social class or income. The relative differences in change between the education and social class subgroups were small. The employed and those under the age of 35 did not suffer from increased alcohol-related mortality in the two years following the price reduction. The gap between the age and education groups, which was substantial in the 1980s, thus further broadened. With regard to alcohol-related hospitalisation, there was an increase in both chronic and acute causes among men under the age of 70, and among women in the 50-69-year age group when trends and seasonal variation were taken into account. Alcohol dependence and other alcohol-related mental and behavioural disorders were the largest category in both the total number of chronic hospitalisation and in the increase. There was no increase in the rate of interpersonal violence in the Helsinki Metropolitan area, and even a decrease in domestic violence. There was a significant relationship between the measures of social disadvantage on the area level and interpersonal violence, although the differences in the effects of the price reduction between the different areas were small. The findings of the present study suggest that that a reduction in alcohol prices may lead to a substantial increase in alcohol-related mortality and morbidity. However, large population group differences were observed regarding responsiveness to the price changes. In particular, the less privileged, such as the unemployed, were most sensitive. In contrast, at least in the Finnish context, the younger generations and the employed do not appear to be adversely affected, and those in the older age groups may even benefit from cheaper alcohol in terms of decreased rates of CVD mortality. The results also suggest that reductions in alcohol prices do not necessarily affect interpersonal violence. The population group differences in the effects of the price changes on alcohol-related harm should be acknowledged, and therefore the policy actions should focus on the population subgroups that are primarily responsive to the price reduction.
Resumo:
The indigenous cloud forests in the Taita Hills have suffered substantial degradation for several centuries due to agricultural expansion. Currently, only 1% of the original forested area remains preserved in this region. Furthermore, climate change imposes an imminent threat for local economy and environmental sustainability. In such circumstances, elaborating tools to conciliate socioeconomic growth and natural resources conservation is an enormous challenge. This dissertation tackles essential aspects for understanding the ongoing agricultural activities in the Taita Hills and their potential environmental consequences in the future. Initially, alternative methods were designed to improve our understanding of the ongoing agricultural activities. Namely, methods for agricultural survey planning and to estimate evapotranspiration were evaluated, taking into account a number of limitations regarding data and resources availability. Next, this dissertation evaluates how upcoming agricultural expansion, together with climate change, will affect the natural resources in the Taita Hills up to the year 2030. The driving forces of agricultural expansion in the region were identified as aiming to delineate future landscape scenarios and evaluate potential impacts from the soil and water conservation point of view. In order to investigate these issues and answer the research questions, this dissertation combined state of the art modelling tools with renowned statistical methods. The results indicate that, if current trends persist, agricultural areas will occupy roughly 60% of the study area by 2030. Although the simulated land use changes will certainly increase soil erosion figures, new croplands are likely to come up predominantly in the lowlands, which comprise areas with lower soil erosion potential. By 2030, rainfall erosivity is likely to increase during April and November due to climate change. Finally, this thesis addressed the potential impacts of agricultural expansion and climate changes on Irrigation Water Requirements (IWR), which is considered another major issue in the context of the relations between land use and climate. Although the simulations indicate that climate change will likely increase annual volumes of rainfall during the following decades, IWR will continue to increase due to agricultural expansion. By 2030, new cropland areas may cause an increase of approximately 40% in the annual volume of water necessary for irrigation.
Resumo:
Statistical methods for optimizing the morphology of oxide-based, bifunctional oxygen electrodes for use in rechargeable metal/air batteries are examined with regard to binder composition, compaction time, and compaction load. Results show that LaNiO3 with PTFE binder in a nickel mesh envelope provides a satisfactory electrode.
Resumo:
OBJECTIVES. Oral foreign language skills are an integral part of one's social, academic and professional competence. This can be problematic for those suffering from foreign language communication apprehension (CA), or a fear of speaking a foreign language. CA manifests itself, for example, through feelings of anxiety and tension, physical arousal and avoidance of foreign language communication situations. According to scholars, foreign language CA may impede the language learning process significantly and have detrimental effects on one's language learning, academic achievement and career prospects. Drawing on upper secondary students' subjective experiences of communication situations in English as a foreign language, this study seeks, first, to describe, analyze and interpret why upper secondary students experience English language communication apprehension in English as a foreign language (EFL) classes. Second, this study seeks to analyse what the most anxiety-arousing oral production tasks in EFL classes are, and which features of different oral production tasks arouse English language communication apprehension and why. The ultimate objectives of the present study are to raise teachers' awareness of foreign language CA and its features, manifestations and impacts in foreign language classes as well as to suggest possible ways to minimize the anxiety-arousing features in foreign language classes. METHODS. The data was collected in two phases by means of six-part Likert-type questionnaires and theme interviews, and analysed using both quantitative and qualitative methods. The questionnaire data was collected in spring 2008. The respondents were 122 first-year upper secondary students, 68 % of whom were girls and 31 % of whom were boys. The data was analysed by statistical methods using SPSS software. The theme interviews were conducted in spring 2009. The interviewees were 11 second-year upper secondary students aged 17 to 19, who were chosen by purposeful selection on the basis of their English language CA level measured in the questionnaires. Six interviewees were classified as high apprehensives and five as low apprehensives according to their score in the foreign language CA scale in the questionnaires. The interview data was coded and thematized using the technique of content analysis. The analysis and interpretation of the data drew on a comparison of the self-reports of the highly apprehensive and low apprehensive upper secondary students. RESULTS. The causes of English language CA in EFL classes as reported by the students were both internal and external in nature. The most notable causes were a low self-assessed English proficiency, a concern over errors, a concern over evaluation, and a concern over the impression made on others. Other causes related to a high English language CA were a lack of authentic oral practise in EFL classes, discouraging teachers and negative experiences of learning English, unrealistic internal demands for oral English performance, high external demands and expectations for oral English performance, the conversation partner's higher English proficiency, and the audience's large size and unfamiliarity. The most anxiety-arousing oral production tasks in EFL classes were presentations or speeches with or without notes in front of the class, acting in front of the class, pair debates with the class as audience, expressing thoughts and ideas to the class, presentations or speeches without notes while seated, group debates with the class as audience, and answering to the teacher's questions involuntarily. The main features affecting the anxiety-arousing potential of an oral production task were a high degree of attention, a large audience, a high degree of evaluation, little time for preparation, little linguistic support, and a long duration.
Resumo:
The main purpose of the Master Thesis was to find out what kind of attitudes the pupils in the 9th grade of Finnish comprehensive school have towards music as a school subject and compare it to the attitudes of the principals at a school level. The theoretical context of the research is based on the former studies of the significance of music education in the comprehensive school, the connection between learning and attitudes and the motivational factors towards the study motivation of music. In addition to this, I have analysed the role of the evaluation and the assessment from the point of view of developing the educational system and what is the role of management and leadership in relation to the pupils` behaviour and attitudes. The data of the research is the Finnish National Board of Education`s collected data of the assessment of the learning outcomes of arts education and it is nationally representative (N=5056 I phase and n=1570 II phase), both the Finnish-language and the Swedish-language pupil data. I have especially concentrated on the items of measuring the attitudes, the certain background variables and the questionnaire of the principals. The numerical data was analyzed using the multivariate statistical methods. The results of the research prove that in general the pupils and the principals think that music is quite significant as a school subject. The girls valued music on average more than the boys when comparing all the dimensions. The differences were systematic but the effect sizes were under 10 %. There were not statistically significant differences between the Finnish-language and the Swedish-language pupils. Comparing the grades of music in the 7th grade, the differences were growing linearly and the effect size was 15.7 %. There was a positive statistically significant correlation between the Significance of music and music as a hobby (Active interest in music, Informal interest in music, Taking part of music activities in the school) during free time. The strongest correlation were with the Active interest in music variable (r= 0.53, p= .000). Also the principals thought that music is important as a school subject considering the development of the pupil and the function of the school. The answers of the pupils were not clustering at a school level and there were no strong correlations between the attitudes of the pupils and the principals. A statistically nearly significant and a slight correlation (r= 0.21, p= .011) was found between the principals valuing the Significance of the music for school function and the pupils valuing the Benefits and hobbyism. The role of a well-motivated and active music teacher can be important from this point of view. The most important conclusion of the research was that the significance of music is a very personal individual level phenomenon. The results highlight also that in the pupils` opinion the most important thing about music lessons is to musical activity and learning as an experience.
Resumo:
In this dissertation I study language complexity from a typological perspective. Since the structuralist era, it has been assumed that local complexity differences in languages are balanced out in cross-linguistic comparisons and that complexity is not affected by the geopolitical or sociocultural aspects of the speech community. However, these assumptions have seldom been studied systematically from a typological point of view. My objective is to define complexity so that it is possible to compare it across languages and to approach its variation with the methods of quantitative typology. My main empirical research questions are: i) does language complexity vary in any systematic way in local domains, and ii) can language complexity be affected by the geographical or social environment? These questions are studied in three articles, whose findings are summarized in the introduction to the dissertation. In order to enable cross-language comparison, I measure complexity as the description length of the regularities in an entity; I separate it from difficulty, focus on local instead of global complexity, and break it up into different types. This approach helps avoid the problems that plagued earlier metrics of language complexity. My approach to grammar is functional-typological in nature, and the theoretical framework is basic linguistic theory. I delimit the empirical research functionally to the marking of core arguments (the basic participants in the sentence). I assess the distributions of complexity in this domain with multifactorial statistical methods and use different sampling strategies, implementing, for instance, the Greenbergian view of universals as diachronic laws of type preference. My data come from large and balanced samples (up to approximately 850 languages), drawn mainly from reference grammars. The results suggest that various significant trends occur in the marking of core arguments in regard to complexity and that complexity in this domain correlates with population size. These results provide evidence that linguistic patterns interact among themselves in terms of complexity, that language structure adapts to the social environment, and that there may be cognitive mechanisms that limit complexity locally. My approach to complexity and language universals can therefore be successfully applied to empirical data and may serve as a model for further research in these areas.
Resumo:
This work is a case study of applying nonparametric statistical methods to corpus data. We show how to use ideas from permutation testing to answer linguistic questions related to morphological productivity and type richness. In particular, we study the use of the suffixes -ity and -ness in the 17th-century part of the Corpus of Early English Correspondence within the framework of historical sociolinguistics. Our hypothesis is that the productivity of -ity, as measured by type counts, is significantly low in letters written by women. To test such hypotheses, and to facilitate exploratory data analysis, we take the approach of computing accumulation curves for types and hapax legomena. We have developed an open source computer program which uses Monte Carlo sampling to compute the upper and lower bounds of these curves for one or more levels of statistical significance. By comparing the type accumulation from women’s letters with the bounds, we are able to confirm our hypothesis.
Resumo:
The aim of the thesis was to study the extent of spatial concentration of immigrant population in Helsinki and to analyse the impact of housing policy on ethnic residential segregation in 1992-2005. For the purpose of the study, immigrant population was defined based on the language spoken at home. The theory of residential segregation by Andersson and Molina formed the main theoretical framework for the study. According to Andersson and Molina ethnic residential segregation results from different dynamic intra-urban migration processes. Institutionally generated migration, i.e. migration patterns generated by various housing and immigrant policies and procedures, is one of the central factors in the development of ethnic segregation. The data of the study consisted of population and housing statistics and housing and immigrant policy documents of Helsinki municipality. Spatial concentration of immigrant population was studied both at district and building levels using GIS-methods and statistical methods. The housing policy of Helsinki municipality was analysed using a method created by Musterd et al. Musterd et al. categorise two types of policy approaches to residential segregation: spatial dispersion policy and compensating policy. The housing policy of Helsinki has a strong focus on social mixing and spatial dispersion of housing stock. Ethnic segregation is regarded as a threat. The importance of ethnic communities and networks is, however, acknowledged and small-scale concentration is therefore not considered harmful. Despite the spatial dispersion policy, the immigrant population is concentrated in the eastern, north-eastern and north-western suburbs of Helsinki. The spatial pattern of concentration was formed already at the beginning of the 1990's when immigration to Finland suddenly peaked. New immigrant groups were housed in the neighbourhoods where public housing was available at the time. Housing policy, namely the location of new residential areas and public housing blocks and the policies of public housing allocation were key factors influencing the residential patterns of immigrant population in the 1990's. The immigration and refugee policies of the state have also had an impact on the development. The concentration of immigrant population has continued in the same areas in the beginning of the 2000's. Dispersion to new areas has mainly taken place within the eastern and north-eastern parts of the city or in the adjacent areas. The migration patterns of native population and the reasonably rapid changes in the housing market have emerged as new factors generating and influencing the ethnic residential segregation in Helsinki in the 2000's. Due to social mixing and spatial dispersion policies, ethnic segregation in Helsinki has so far been fairly small-scale, concentrated in particular housing blocks. The number of residential buildings with a high share of immigrant population is very modest. However, the number of such buildings has doubled between 1996-2002. The concentration of immigrant population concerns mainly the public housing sector. The difference in the level of concentration between the public housing sector and privately owned housing companies is remarkable.