9 resultados para Relative complexity
em Helda - Digital Repository of University of Helsinki
Resumo:
This dissertation consists of four articles and an introduction. The five parts address the same topic, nonverbal predication in Erzya, from different perspectives. The work is at the same time linguistic typology and Uralic studies. The findings based on a large corpus of empirical Erzya data, which was collected using several different methods and included recordings of the spoken language, made it possible for the present study to apply, then test and finally discuss the previous theories based on cross-linguistic data. Erzya makes use of multiple predication patterns which vary from totally analytic to the morphologically very complex. Nonverbal predicate clause types are classified on the basis of propositional acts in clauses denoting class-membership, identity, property and location. The predicates of these clauses are nouns, adjectives and locational expressions, respectively. The following three predication strategies in Erzya nonverbal predication can be identified: i. the zero-copula construction, ii. the predicative suffix construction and iii. the copula construction. It has been suggested that verbs and nouns cannot be clearly distinguished on morphological grounds when functioning as predicates in Erzya. This study shows that even though predicativity must not be considered a sufficient tool for defining parts of speech in any language, the Erzya lexical classes of adjective, noun and verb can be distinguished from each other also in predicate position. The relative frequency and degree of obligation for using the predicative suffix construction decreases when moving left to right on the scale verb adjective/locative noun ( identificational statement). The predicative suffix is the main pattern in the present tense over the whole domain of nonverbal predication in Standard Erzya, but if it is replaced it is most likely to be with a zero-copula construction in a nominal predication. This study exploits the theory of (a)symmetry for the first time in order to describe verbal vs. nonverbal predication. It is shown that the asymmetry of paradigms and constructions differentiates the lexical classes. Asymmetrical structures are motivated by functional level asymmetry. Variation in predication as such adds to the complexity of the grammar. When symmetric structures are employed, the functional complexity of grammar decreases, even though morphological complexity increases. The genre affects the employment of predication strategies in Erzya. There are differences in the relative frequency of the patterns, and some patterns are totally lacking from some of the data. The clearest difference is that the past tense predicative suffix construction occurs relatively frequently in Standard Erzya, while it occurs infrequently in the other data. Also, the predicative suffixes of the present tense are used more regularly in written Standard Erzya than in any other genre. The genre also affects the incidence of the translative in uľ(ń)ems copula constructions. In translations from Russian to Erzya the translative case is employed relatively frequently in comparison to other data. This study reveals differences between the two Mordvinic languages Erzya and Moksha. The predicative suffixes (bound person markers) of the present tense are used more regularly in Moksha in all kinds of nonverbal predicate clauses compared to Erzya. It should further be observed that identificational statements are encoded with a predicative suffix in Moksha, but seldom in Erzya. Erzya clauses are more frequently encoded using zero-constructions, displaying agreement in number only.
Resumo:
Minimum Description Length (MDL) is an information-theoretic principle that can be used for model selection and other statistical inference tasks. There are various ways to use the principle in practice. One theoretically valid way is to use the normalized maximum likelihood (NML) criterion. Due to computational difficulties, this approach has not been used very often. This thesis presents efficient floating-point algorithms that make it possible to compute the NML for multinomial, Naive Bayes and Bayesian forest models. None of the presented algorithms rely on asymptotic analysis and with the first two model classes we also discuss how to compute exact rational number solutions.
Resumo:
The Minimum Description Length (MDL) principle is a general, well-founded theoretical formalization of statistical modeling. The most important notion of MDL is the stochastic complexity, which can be interpreted as the shortest description length of a given sample of data relative to a model class. The exact definition of the stochastic complexity has gone through several evolutionary steps. The latest instantation is based on the so-called Normalized Maximum Likelihood (NML) distribution which has been shown to possess several important theoretical properties. However, the applications of this modern version of the MDL have been quite rare because of computational complexity problems, i.e., for discrete data, the definition of NML involves an exponential sum, and in the case of continuous data, a multi-dimensional integral usually infeasible to evaluate or even approximate accurately. In this doctoral dissertation, we present mathematical techniques for computing NML efficiently for some model families involving discrete data. We also show how these techniques can be used to apply MDL in two practical applications: histogram density estimation and clustering of multi-dimensional data.
Resumo:
Matrix decompositions, where a given matrix is represented as a product of two other matrices, are regularly used in data mining. Most matrix decompositions have their roots in linear algebra, but the needs of data mining are not always those of linear algebra. In data mining one needs to have results that are interpretable -- and what is considered interpretable in data mining can be very different to what is considered interpretable in linear algebra. --- The purpose of this thesis is to study matrix decompositions that directly address the issue of interpretability. An example is a decomposition of binary matrices where the factor matrices are assumed to be binary and the matrix multiplication is Boolean. The restriction to binary factor matrices increases interpretability -- factor matrices are of the same type as the original matrix -- and allows the use of Boolean matrix multiplication, which is often more intuitive than normal matrix multiplication with binary matrices. Also several other decomposition methods are described, and the computational complexity of computing them is studied together with the hardness of approximating the related optimization problems. Based on these studies, algorithms for constructing the decompositions are proposed. Constructing the decompositions turns out to be computationally hard, and the proposed algorithms are mostly based on various heuristics. Nevertheless, the algorithms are shown to be capable of finding good results in empirical experiments conducted with both synthetic and real-world data.
Resumo:
We have presented an overview of the FSIG approach and related FSIG gram- mars to issues of very low complexity and parsing strategy. We ended up with serious optimism according to which most FSIG grammars could be decom- posed in a reasonable way and then processed efficiently.
Resumo:
This study contributes to the neglect effect literature by looking at the relative trading volume in terms of value. The results for the Swedish market show a significant positive relationship between the accuracy of estimation and the relative trading volume. Market capitalisation and analyst coverage have in prior studies been used as proxies for neglect. These measures however, do not take into account the effort analysts put in when estimating corporate pre-tax profits. I also find evidence that the industry of the firm influence the accuracy of estimation. In addition, supporting earlier findings, loss making firms are associated with larger forecasting errors. Further, I find that the average forecast error increased in the year 2000 – in Sweden.
Resumo:
Uveal melanoma (UM) is the second most common primary intraocular cancer worldwide. It is a relatively rare cancer, but still the second most common type of primary malignant melanoma in humans. UM is a slowly growing tumor, and gives rise to distant metastasis mainly to the liver via the bloodstream. About 40% of patients with UM die of metastatic disease within 10 years of diagnosis, irrespective of the type of treatment. During the last decade, two main lines of research have aimed to achieve enhanced understanding of the metastasis process and accurate prognosis of patients with UM. One emphasizes the characteristics of tumor cells, particularly their nucleoli, and markers of proliferation, and the other the characteristics of tumor blood vessels. Of several morphometric measurements, the mean diameter of the ten largest nucleoli (MLN) has become the most widely applied. A large MLN has consistently been associated with high likelihood of dying from UM. Blood vessels are of paramount importance in metastasis of UM. Different extravascular matrix patterns can be seen in UM, like loops and networks. This presence is associated with death from metastatic melanoma. However, the density of microvessels is also of prognostic importance. This study was undertaken to help understanding some histopathological factors which might contribute to developing metastasis in UM patients. Factors which could be related to tumor progression to metastasis disease, namely nucleolar size, MLN, microvascular density (MVD), cell proliferation, and The Insulin-like Growth Factor 1 Receptor(IGF-1R), were investigated. The primary aim of this thesis was to study the relationship between prognostic factors such as tumor cell nucleolar size, proliferation, extravascular matrix patterns, and dissemination of UM, and to assess to what extent there is a relationship to metastasis. The secondary goal was to develop a multivariate model which includes MLN and cell proliferation in addition to MVD, and which would fit better with population-based, melanoma-related survival data than previous models. I studied 167 patients with UM, who developed metastasis even after a very long time following removal of the eye, metastatic disease was the main cause of death, as documented in the Finnish Cancer Registry and on death certificates. Using an independent population-based data set, it was confirmed that MLN and extravascular matrix loops and networks were unrelated, independent predictors of survival in UM. Also, it has been found that multivariate models including MVD in addition to MLN fitted significantly better with survival data than models which excluded MVD. This supports the idea that both the characteristics of the blood vessels and the cells are important, and the future direction would be to look for the gene expression profile, whether it is associated more with MVD or MLN. The former relates to the host response to the tumor and may not be as tightly associated with the gene expression profile, yet most likely involved in the process of hematogenous metastasis. Because fresh tumor material is needed for reliable genetic analysis, such analysis could not be performed Although noninvasive detection of certain extravascular matrix patterns is now technically possible,in managing patients with UM, this study and tumor genetics suggest that such noninvasive methods will not fully capture the process of clinical metastasis. Progress in resection and biopsy techniques is likely in the near future to result in fresh material for the ophthalmic pathologist to correlate angiographic data, histopathological characteristics such as MLN, and genetic data. This study supported the theory that tumors containing epithelioid cells grow faster and have poorer prognosis when studied by cell proliferation in UM based on Ki-67 immunoreactivity. Cell proliferation index fitted best with the survival data when combined with MVD, MLN, and presence of epithelioid cells. Analogous with the finding that high MVD in primary UM is associated with shorter time to metastasis than low MVD, high MVD in hepatic metastasis tends to be associated with shorter survival after diagnosis of metastasis. Because the liver is the main organ for metastasis from UM, growth factors largely produced in the liver hepatocyte growth factor, epidermal growth factor and insulin-like growth factor-1 (IGF-1) together with their receptors may have a role in the homing and survival of metastatic cells. Therefore the association between immunoreactivity for IGF-1R in primary UM and metastatic death was studied. It was found that immunoreactivity for IGF-IR did not independently predict metastasis from primary UM in my series.
Resumo:
In this dissertation I study language complexity from a typological perspective. Since the structuralist era, it has been assumed that local complexity differences in languages are balanced out in cross-linguistic comparisons and that complexity is not affected by the geopolitical or sociocultural aspects of the speech community. However, these assumptions have seldom been studied systematically from a typological point of view. My objective is to define complexity so that it is possible to compare it across languages and to approach its variation with the methods of quantitative typology. My main empirical research questions are: i) does language complexity vary in any systematic way in local domains, and ii) can language complexity be affected by the geographical or social environment? These questions are studied in three articles, whose findings are summarized in the introduction to the dissertation. In order to enable cross-language comparison, I measure complexity as the description length of the regularities in an entity; I separate it from difficulty, focus on local instead of global complexity, and break it up into different types. This approach helps avoid the problems that plagued earlier metrics of language complexity. My approach to grammar is functional-typological in nature, and the theoretical framework is basic linguistic theory. I delimit the empirical research functionally to the marking of core arguments (the basic participants in the sentence). I assess the distributions of complexity in this domain with multifactorial statistical methods and use different sampling strategies, implementing, for instance, the Greenbergian view of universals as diachronic laws of type preference. My data come from large and balanced samples (up to approximately 850 languages), drawn mainly from reference grammars. The results suggest that various significant trends occur in the marking of core arguments in regard to complexity and that complexity in this domain correlates with population size. These results provide evidence that linguistic patterns interact among themselves in terms of complexity, that language structure adapts to the social environment, and that there may be cognitive mechanisms that limit complexity locally. My approach to complexity and language universals can therefore be successfully applied to empirical data and may serve as a model for further research in these areas.