36 resultados para Empirical Methods in NLP
em Helda - Digital Repository of University of Helsinki
Resumo:
Modern-day weather forecasting is highly dependent on Numerical Weather Prediction (NWP) models as the main data source. The evolving state of the atmosphere with time can be numerically predicted by solving a set of hydrodynamic equations, if the initial state is known. However, such a modelling approach always contains approximations that by and large depend on the purpose of use and resolution of the models. Present-day NWP systems operate with horizontal model resolutions in the range from about 40 km to 10 km. Recently, the aim has been to reach operationally to scales of 1 4 km. This requires less approximations in the model equations, more complex treatment of physical processes and, furthermore, more computing power. This thesis concentrates on the physical parameterization methods used in high-resolution NWP models. The main emphasis is on the validation of the grid-size-dependent convection parameterization in the High Resolution Limited Area Model (HIRLAM) and on a comprehensive intercomparison of radiative-flux parameterizations. In addition, the problems related to wind prediction near the coastline are addressed with high-resolution meso-scale models. The grid-size-dependent convection parameterization is clearly beneficial for NWP models operating with a dense grid. Results show that the current convection scheme in HIRLAM is still applicable down to a 5.6 km grid size. However, with further improved model resolution, the tendency of the model to overestimate strong precipitation intensities increases in all the experiment runs. For the clear-sky longwave radiation parameterization, schemes used in NWP-models provide much better results in comparison with simple empirical schemes. On the other hand, for the shortwave part of the spectrum, the empirical schemes are more competitive for producing fairly accurate surface fluxes. Overall, even the complex radiation parameterization schemes used in NWP-models seem to be slightly too transparent for both long- and shortwave radiation in clear-sky conditions. For cloudy conditions, simple cloud correction functions are tested. In case of longwave radiation, the empirical cloud correction methods provide rather accurate results, whereas for shortwave radiation the benefit is only marginal. Idealised high-resolution two-dimensional meso-scale model experiments suggest that the reason for the observed formation of the afternoon low level jet (LLJ) over the Gulf of Finland is an inertial oscillation mechanism, when the large-scale flow is from the south-east or west directions. The LLJ is further enhanced by the sea-breeze circulation. A three-dimensional HIRLAM experiment, with a 7.7 km grid size, is able to generate a similar LLJ flow structure as suggested by the 2D-experiments and observations. It is also pointed out that improved model resolution does not necessary lead to better wind forecasts in the statistical sense. In nested systems, the quality of the large-scale host model is really important, especially if the inner meso-scale model domain is small.
Resumo:
In this dissertation, I present an overall methodological framework for studying linguistic alternations, focusing specifically on lexical variation in denoting a single meaning, that is, synonymy. As the practical example, I employ the synonymous set of the four most common Finnish verbs denoting THINK, namely ajatella, miettiä, pohtia and harkita ‘think, reflect, ponder, consider’. As a continuation to previous work, I describe in considerable detail the extension of statistical methods from dichotomous linguistic settings (e.g., Gries 2003; Bresnan et al. 2007) to polytomous ones, that is, concerning more than two possible alternative outcomes. The applied statistical methods are arranged into a succession of stages with increasing complexity, proceeding from univariate via bivariate to multivariate techniques in the end. As the central multivariate method, I argue for the use of polytomous logistic regression and demonstrate its practical implementation to the studied phenomenon, thus extending the work by Bresnan et al. (2007), who applied simple (binary) logistic regression to a dichotomous structural alternation in English. The results of the various statistical analyses confirm that a wide range of contextual features across different categories are indeed associated with the use and selection of the selected think lexemes; however, a substantial part of these features are not exemplified in current Finnish lexicographical descriptions. The multivariate analysis results indicate that the semantic classifications of syntactic argument types are on the average the most distinctive feature category, followed by overall semantic characterizations of the verb chains, and then syntactic argument types alone, with morphological features pertaining to the verb chain and extra-linguistic features relegated to the last position. In terms of overall performance of the multivariate analysis and modeling, the prediction accuracy seems to reach a ceiling at a Recall rate of roughly two-thirds of the sentences in the research corpus. The analysis of these results suggests a limit to what can be explained and determined within the immediate sentential context and applying the conventional descriptive and analytical apparatus based on currently available linguistic theories and models. The results also support Bresnan’s (2007) and others’ (e.g., Bod et al. 2003) probabilistic view of the relationship between linguistic usage and the underlying linguistic system, in which only a minority of linguistic choices are categorical, given the known context – represented as a feature cluster – that can be analytically grasped and identified. Instead, most contexts exhibit degrees of variation as to their outcomes, resulting in proportionate choices over longer stretches of usage in texts or speech.
Resumo:
The aim of the present study was to advance the methodology and use of time series analysis to quantify dynamic structures in psychophysiological processes and thereby to produce information on spontaneously coupled physiological responses and their behavioral and experiential correlates. Series of analyses using both simulated and empirical cardiac (IBI), electrodermal (EDA), and facial electromyographic (EMG) data indicated that, despite potential autocorrelated structures, smoothing increased the reliability of detecting response coupling from an interindividual distribution of intraindividual measures and that especially the measures of covariance produced accurate information on the extent of coupled responses. This methodology was applied to analyze spontaneously coupled IBI, EDA, and facial EMG responses and vagal activity in their relation to emotional experience and personality characteristics in a group of middle-aged men (n = 37) during the administration of the Rorschach testing protocol. The results revealed new characteristics in the relationship between phasic end-organ synchronization and vagal activity, on the one hand, and individual differences in emotional adjustment to novel situations on the other. Specifically, it appeared that the vagal system is intimately related to emotional and social responsivity. It was also found that the lack of spontaneously synchronized responses is related to decreased energetic arousal (e.g., depression, mood). These findings indicate that the present process analysis approach has many advantages for use in both experimental and applied research, and that it is a useful new paradigm in psychophysiological research. Keywords: Autonomic Nervous System; Emotion; Facial Electromyography; Individual Differences; Spontaneous Responses; Time Series Analysis; Vagal System
Resumo:
The quantification and characterisation of soil phosphorus (P) is of agricultural and environmental importance and different extraction methods are widely used to asses the bioavailability of P and to characterize soil P reserves. However, the large variety of extractants, pre-treatments and sample preparation procedures complicate the comparison of published results. In order to improve our understanding of the behaviour and cycling of P in soil, it is crucial to know the scientific relevance of the methods used for various purposes. The knowledge of the factors affecting the analytical outcome is a prerequisite for justified interpretation of the results. The aim of this thesis was to study the effects of sample preparation procedures on soil P and to determine the dependence of the recovered P pool on the chemical nature of extractants. Sampling is a critical step in soil testing and sampling strategy is dependent on the land-use history and the purpose of sampling. This study revealed that pre-treatments changed soil properties and air-drying was found to affect soil P, particularly extractable organic P, by disrupting organic matter. This was evidenced by an increase in the water-extractable small-sized (<0.2 µm) P that, at least partly, took place at the expense of the large-sized (>0.2 µm) P. However, freezing induced only insignificant changes and thus, freezing can be taken to be a suitable method for storing soils from the boreal zone that naturally undergo periodic freezing. The results demonstrated that chemical nature of the extractant affects its sensitivity to detect changes in soil P solubility. Buffered extractants obscured the alterations in P solubility induced by pH changes; however, water extraction, though sensitive to physicochemical changes, can be used to reveal short term changes in soil P solubility. As for the organic P, the analysis was found to be sensitive to the sample preparation procedures: filtering may leave a large proportion of extractable organic P undetected, whereas the outcome of centrifugation was found to be affected by the ionic strength of the extractant. Widely used sequential fractionation procedures proved to be able to detect land-use -derived differences in the distribution of P among fractions of different solubilities. However, interpretation of the results from extraction experiments requires better understanding of the biogeochemical function of the recovered P fraction in the P cycle in differently managed soils under dissimilar climatic conditions.
Resumo:
It is well known that an integrable (in the sense of Arnold-Jost) Hamiltonian system gives rise to quasi-periodic motion with trajectories running on invariant tori. These tori foliate the whole phase space. If we perturb an integrable system, the Kolmogorow-Arnold-Moser (KAM) theorem states that, provided some non-degeneracy condition and that the perturbation is sufficiently small, most of the invariant tori carrying quasi-periodic motion persist, getting only slightly deformed. The measure of the persisting invariant tori is large together with the inverse of the size of the perturbation. In the first part of the thesis we shall use a Renormalization Group (RG) scheme in order to prove the classical KAM result in the case of a non analytic perturbation (the latter will only be assumed to have continuous derivatives up to a sufficiently large order). We shall proceed by solving a sequence of problems in which theperturbations are analytic approximations of the original one. We will finally show that the approximate solutions will converge to a differentiable solution of our original problem. In the second part we will use an RG scheme using continuous scales, so that instead of solving an iterative equation as in the classical RG KAM, we will end up solving a partial differential equation. This will allow us to reduce the complications of treating a sequence of iterative equations to the use of the Banach fixed point theorem in a suitable Banach space.
Resumo:
The aim of this study was to evaluate and test methods which could improve local estimates of a general model fitted to a large area. In the first three studies, the intention was to divide the study area into sub-areas that were as homogeneous as possible according to the residuals of the general model, and in the fourth study, the localization was based on the local neighbourhood. According to spatial autocorrelation (SA), points closer together in space are more likely to be similar than those that are farther apart. Local indicators of SA (LISAs) test the similarity of data clusters. A LISA was calculated for every observation in the dataset, and together with the spatial position and residual of the global model, the data were segmented using two different methods: classification and regression trees (CART) and the multiresolution segmentation algorithm (MS) of the eCognition software. The general model was then re-fitted (localized) to the formed sub-areas. In kriging, the SA is modelled with a variogram, and the spatial correlation is a function of the distance (and direction) between the observation and the point of calculation. A general trend is corrected with the residual information of the neighbourhood, whose size is controlled by the number of the nearest neighbours. Nearness is measured as Euclidian distance. With all methods, the root mean square errors (RMSEs) were lower, but with the methods that segmented the study area, the deviance in single localized RMSEs was wide. Therefore, an element capable of controlling the division or localization should be included in the segmentation-localization process. Kriging, on the other hand, provided stable estimates when the number of neighbours was sufficient (over 30), thus offering the best potential for further studies. Even CART could be combined with kriging or non-parametric methods, such as most similar neighbours (MSN).
Resumo:
There is an increasing need to compare the results obtained with different methods of estimation of tree biomass in order to reduce the uncertainty in the assessment of forest biomass carbon. In this study, tree biomass was investigated in a 30-year-old Scots pine (Pinus sylvestris) (Young-Stand) and a 130-year-old mixed Norway spruce (Picea abies)-Scots pine stand (Mature-Stand) located in southern Finland (61º50' N, 24º22' E). In particular, a comparison of the results of different estimation methods was conducted to assess the reliability and suitability of their applications. For the trees in Mature-Stand, annual stem biomass increment fluctuated following a sigmoid equation, and the fitting curves reached a maximum level (from about 1 kg/yr for understorey spruce to 7 kg/yr for dominant pine) when the trees were 100 years old. Tree biomass was estimated to be about 70 Mg/ha in Young-Stand and about 220 Mg/ha in Mature-Stand. In the region (58.00-62.13 ºN, 14-34 ºE, ≤ 300 m a.s.l.) surrounding the study stands, the tree biomass accumulation in Norway spruce and Scots pine stands followed a sigmoid equation with stand age, with a maximum of 230 Mg/ha at the age of 140 years. In Mature-Stand, lichen biomass on the trees was 1.63 Mg/ha with more than half of the biomass occurring on dead branches, and the standing crop of litter lichen on the ground was about 0.09 Mg/ha. There were substantial differences among the results estimated by different methods in the stands. These results imply that a possible estimation error should be taken into account when calculating tree biomass in a stand with an indirect approach.
Resumo:
In this thesis we present and evaluate two pattern matching based methods for answer extraction in textual question answering systems. A textual question answering system is a system that seeks answers to natural language questions from unstructured text. Textual question answering systems are an important research problem because as the amount of natural language text in digital format grows all the time, the need for novel methods for pinpointing important knowledge from the vast textual databases becomes more and more urgent. We concentrate on developing methods for the automatic creation of answer extraction patterns. A new type of extraction pattern is developed also. The pattern matching based approach chosen is interesting because of its language and application independence. The answer extraction methods are developed in the framework of our own question answering system. Publicly available datasets in English are used as training and evaluation data for the methods. The techniques developed are based on the well known methods of sequence alignment and hierarchical clustering. The similarity metric used is based on edit distance. The main conclusions of the research are that answer extraction patterns consisting of the most important words of the question and of the following information extracted from the answer context: plain words, part-of-speech tags, punctuation marks and capitalization patterns, can be used in the answer extraction module of a question answering system. This type of patterns and the two new methods for generating answer extraction patterns provide average results when compared to those produced by other systems using the same dataset. However, most answer extraction methods in the question answering systems tested with the same dataset are both hand crafted and based on a system-specific and fine-grained question classification. The the new methods developed in this thesis require no manual creation of answer extraction patterns. As a source of knowledge, they require a dataset of sample questions and answers, as well as a set of text documents that contain answers to most of the questions. The question classification used in the training data is a standard one and provided already in the publicly available data.
Resumo:
The thesis examines urban issues arising from the transformation from state socialism to a market economy. The main topics are residential differentiation, i.e., uneven spatial distribution of social groups across urban residential areas, and the effects of housing policy and town planning on urban development. The case study is development in Tallinn, the capital city of Estonia, in the context of development of Central and Eastern European cities under and after socialism. The main body of the thesis consists of four separately published refereed articles. The research question that brings the articles together is how the residential (socio-spatial) pattern of cities developed during the state socialist period and how and why that pattern has changed since the transformation to a market economy began. The first article reviews the literature on residential differentiation in Budapest, Prague, Tallinn and Warsaw under state socialism from the viewpoint of the role of housing policy in the processes of residential differentiation at various stages of the socialist era. The paper shows how the socialist housing provision system produced socio-occupational residential differentiation directly and indirectly and it describes how the residential patterns of these cities developed. The second article is critical of oversimplified accounts of rapid reorganisation of the overall socio-spatial pattern of post-socialist cities and of claims that residential mobility has had a straightforward role in it. The Tallinn case study, consisting of an analysis of the distribution of socio-economic groups across eight city districts and over four housing types in 1999 as well as examining the role of residential mobility in differentiation during the 1990s, provides contrasting evidence. The third article analyses the role and effects of housing policies in Tallinn s residential differentiation. The focus is on contemporary post-privatisation housing-policy measures and their effects. The article shows that the Estonian housing policies do not even aim to reduce, prevent or slow down the harmful effects of the considerable income disparities that are manifest in housing inequality and residential differentiation. The fourth article examines the development of Tallinn s urban planning system 1991-2004 from the viewpoint of what means it has provided the city with to intervene in urban development and how the city has used these tools. The paper finds that despite some recent progress in planning, its role in guiding where and how the city actually developed has so far been limited. Tallinn s urban development is rather initiated and driven by private agents seeking profit from their investment in land. The thesis includes original empirical research in the three articles that analyse development since socialism. The second article employs quantitative data and methods, primarily index calculation, whereas the third and the fourth ones draw from a survey of policy documents combined with interviews with key informants. Keywords: residential differentiation, housing policy, urban planning, post-socialist transformation, Estonia, Tallinn