11 resultados para COMPUTER SCIENCE, THEORY
em Helda - Digital Repository of University of Helsinki
Resumo:
This thesis combines a computational analysis of a comprehensive corpus of Finnish lake names with a theoretical background in cognitive linguistics. The combination results on the one hand in a description of the toponymic system and the processes involved in analogy-based naming and on the other hand some adjustments to Construction Grammar. Finnish lake names are suitable for this kind of study, as they are to a large extent semantically transparent even when relatively old. There is also a large number of them, and they are comprehensively collected in a computer database. The current work starts with an exploratory computational analysis of co-location patterns between different lake names. Such an analysis makes it possible to assess the importance of analogy and patterns in naming. Prior research has suggested that analogy plays an important role, often also in cases where there are other motivations for the name, and the current study confirms this. However, it also appears that naming patterns are very fuzzy and that their nature is somewhat hard to define in an essentially structuralist tradition. In describing toponymic structure and the processes involved in naming, cognitive linguistics presents itself as a promising theoretical basis. The descriptive formalism of Construction Grammar seems especially well suited for the task. However, now productivity becomes a problem: it is not nearly as clear-cut as the latter theory often assumes, and this is even more apparent in names than in more traditional linguistic material. The varying degree of productivity is most naturally described by a prototype-based theory. Such an approach, however, requires some adjustments to onstruction Grammar. Based on all this, the thesis proposes a descriptive model where a new name -- or more generally, a new linguistic expression -- can be formed by conceptual integration from either a single prior example or a construction generalised from a number of different prior ones. The new model accounts nicely for various aspects of naming that are problematic for the traditional description based on analogy and patterns.
Resumo:
Malli on logiikassa kytetty abstraktio monille matemaattisille objekteille. Esimerkiksi verkot, ryhmt ja metriset avaruudet ovat malleja. rellisten mallien teoria on logiikan osa-alue, jossa tarkastellaan logiikkojen, formaalien kielten, ilmaisuvoimaa malleissa, joiden alkioiden lukumr on rellinen. Rajoittuminen rellisiin malleihin mahdollistaa tulosten soveltamisen teoreettisessa tietojenksittelytieteess, jonka nkkulmasta logiikan kaavoja voidaan ajatella ohjelmina ja rellisi malleja niiden syttein. Lokaalisuus tarkoittaa logiikan kyvyttmyytt erottaa toisistaan malleja, joiden paikalliset piirteet vastaavat toisiaan. Vitskirjassa tarkastellaan useita lokaalisuuden muotoja ja niiden silymist logiikkoja yhdistelless. Kehitettyj tykaluja apuna kytten osoitetaan, ett Gaifman- ja Hanf-lokaalisuudeksi kutsuttujen varianttien vliss on lokaalisuusksitteiden hierarkia, jonka eri tasot voidaan erottaa toisistaan kasvavaa dimensiota olevissa hiloissa. Toisaalta osoitetaan, ett lokaalisuusksitteet eivt eroa toisistaan, kun rajoitutaan tarkastelemaan rellisi puita. Jrjestysinvariantit logiikat ovat kieli, joissa on kytss sisnrakennettu jrjestysrelaatio, mutta sit on kytettv siten, etteivt kaavojen ilmaisemat asiat riipu valitusta jrjestyksest. Mritelm voi motivoida tietojenksittelyn nkkulmasta: vaikka ohjelman sytteen tietojen jrjestyksell ei olisi odotetun tuloksen kannalta merkityst, on syte tietokoneen muistissa aina jossakin jrjestyksess, jota ohjelma voi laskennassaan hydynt. Vitskirjassa tutkitaan minklaisia lokaalisuuden muotoja jrjestysinvariantit ensimmisen kertaluvun predikaattilogiikan laajennukset yksipaikkaisilla kvanttoreilla voivat toteuttaa. Tuloksia sovelletaan tarkastelemalla, milloin sisnrakennettu jrjestys lis logiikan ilmaisuvoimaa rellisiss puissa.
Resumo:
In this thesis we study a few games related to non-wellfounded and stationary sets. Games have turned out to be an important tool in mathematical logic ranging from semantic games defining the truth of a sentence in a given logic to for example games on real numbers whose determinacies have important effects on the consistency of certain large cardinal assumptions. The equality of non-wellfounded sets can be determined by a so called bisimulation game already used to identify processes in theoretical computer science and possible world models for modal logic. Here we present a game to classify non-wellfounded sets according to their branching structure. We also study games on stationary sets moving back to classical wellfounded set theory. We also describe a way to approximate non-wellfounded sets with hereditarily finite wellfounded sets. The framework used to do this is domain theory. In the Banach-Mazur game, also called the ideal game, the players play a descending sequence of stationary sets and the second player tries to keep their intersection stationary. The game is connected to precipitousness of the corresponding ideal. In the pressing down game first player plays regressive functions defined on stationary sets and the second player responds with a stationary set where the function is constant trying to keep the intersection stationary. This game has applications in model theory to the determinacy of the Ehrenfeucht-Fraisse game. We show that it is consistent that these games are not equivalent.
Resumo:
In this Thesis, we develop theory and methods for computational data analysis. The problems in data analysis are approached from three perspectives: statistical learning theory, the Bayesian framework, and the information-theoretic minimum description length (MDL) principle. Contributions in statistical learning theory address the possibility of generalization to unseen cases, and regression analysis with partially observed data with an application to mobile device positioning. In the second part of the Thesis, we discuss so called Bayesian network classifiers, and show that they are closely related to logistic regression models. In the final part, we apply the MDL principle to tracing the history of old manuscripts, and to noise reduction in digital signals.
Resumo:
Analyzing statistical dependencies is a fundamental problem in all empirical science. Dependencies help us understand causes and effects, create new scientific theories, and invent cures to problems. Nowadays, large amounts of data is available, but efficient computational tools for analyzing the data are missing. In this research, we develop efficient algorithms for a commonly occurring search problem - searching for the statistically most significant dependency rules in binary data. We consider dependency rules of the form X->A or X->not A, where X is a set of positive-valued attributes and A is a single attribute. Such rules describe which factors either increase or decrease the probability of the consequent A. A classical example are genetic and environmental factors, which can either cause or prevent a disease. The emphasis in this research is that the discovered dependencies should be genuine - i.e. they should also hold in future data. This is an important distinction from the traditional association rules, which - in spite of their name and a similar appearance to dependency rules - do not necessarily represent statistical dependencies at all or represent only spurious connections, which occur by chance. Therefore, the principal objective is to search for the rules with statistical significance measures. Another important objective is to search for only non-redundant rules, which express the real causes of dependence, without any occasional extra factors. The extra factors do not add any new information on the dependence, but can only blur it and make it less accurate in future data. The problem is computationally very demanding, because the number of all possible rules increases exponentially with the number of attributes. In addition, neither the statistical dependency nor the statistical significance are monotonic properties, which means that the traditional pruning techniques do not work. As a solution, we first derive the mathematical basis for pruning the search space with any well-behaving statistical significance measures. The mathematical theory is complemented by a new algorithmic invention, which enables an efficient search without any heuristic restrictions. The resulting algorithm can be used to search for both positive and negative dependencies with any commonly used statistical measures, like Fisher's exact test, the chi-squared measure, mutual information, and z scores. According to our experiments, the algorithm is well-scalable, especially with Fisher's exact test. It can easily handle even the densest data sets with 10000-20000 attributes. Still, the results are globally optimal, which is a remarkable improvement over the existing solutions. In practice, this means that the user does not have to worry whether the dependencies hold in future data or if the data still contains better, but undiscovered dependencies.
Resumo:
Ubiquitous computing is about making computers and computerized artefacts a pervasive part of our everyday lifes, bringing more and more activities into the realm of information. The computationalization, informationalization of everyday activities increases not only our reach, efficiency and capabilities but also the amount and kinds of data gathered about us and our activities. In this thesis, I explore how information systems can be constructed so that they handle this personal data in a reasonable manner. The thesis provides two kinds of results: on one hand, tools and methods for both the construction as well as the evaluation of ubiquitous and mobile systems---on the other hand an evaluation of the privacy aspects of a ubiquitous social awareness system. The work emphasises real-world experiments as the most important way to study privacy. Additionally, the state of current information systems as regards data protection is studied. The tools and methods in this thesis consist of three distinct contributions. An algorithm for locationing in cellular networks is proposed that does not require the location information to be revealed beyond the user's terminal. A prototyping platform for the creation of context-aware ubiquitous applications called ContextPhone is described and released as open source. Finally, a set of methodological findings for the use of smartphones in social scientific field research is reported. A central contribution of this thesis are the pragmatic tools that allow other researchers to carry out experiments. The evaluation of the ubiquitous social awareness application ContextContacts covers both the usage of the system in general as well as an analysis of privacy implications. The usage of the system is analyzed in the light of how users make inferences of others based on real-time contextual cues mediated by the system, based on several long-term field studies. The analysis of privacy implications draws together the social psychological theory of self-presentation and research in privacy for ubiquitous computing, deriving a set of design guidelines for such systems. The main findings from these studies can be summarized as follows: The fact that ubiquitous computing systems gather more data about users can be used to not only study the use of such systems in an effort to create better systems but in general to study phenomena previously unstudied, such as the dynamic change of social networks. Systems that let people create new ways of presenting themselves to others can be fun for the users---but the self-presentation requires several thoughtful design decisions that allow the manipulation of the image mediated by the system. Finally, the growing amount of computational resources available to the users can be used to allow them to use the data themselves, rather than just being passive subjects of data gathering.
Resumo:
Reorganizing a dataset so that its hidden structure can be observed is useful in any data analysis task. For example, detecting a regularity in a dataset helps us to interpret the data, compress the data, and explain the processes behind the data. We study datasets that come in the form of binary matrices (tables with 0s and 1s). Our goal is to develop automatic methods that bring out certain patterns by permuting the rows and columns. We concentrate on the following patterns in binary matrices: consecutive-ones (C1P), simultaneous consecutive-ones (SC1P), nestedness, k-nestedness, and bandedness. These patterns reflect specific types of interplay and variation between the rows and columns, such as continuity and hierarchies. Furthermore, their combinatorial properties are interlinked, which helps us to develop the theory of binary matrices and efficient algorithms. Indeed, we can detect all these patterns in a binary matrix efficiently, that is, in polynomial time in the size of the matrix. Since real-world datasets often contain noise and errors, we rarely witness perfect patterns. Therefore we also need to assess how far an input matrix is from a pattern: we count the number of flips (from 0s to 1s or vice versa) needed to bring out the perfect pattern in the matrix. Unfortunately, for most patterns it is an NP-complete problem to find the minimum distance to a matrix that has the perfect pattern, which means that the existence of a polynomial-time algorithm is unlikely. To find patterns in datasets with noise, we need methods that are noise-tolerant and work in practical time with large datasets. The theory of binary matrices gives rise to robust heuristics that have good performance with synthetic data and discover easily interpretable structures in real-world datasets: dialectical variation in the spoken Finnish language, division of European locations by the hierarchies found in mammal occurrences, and co-occuring groups in network data. In addition to determining the distance from a dataset to a pattern, we need to determine whether the pattern is significant or a mere occurrence of a random chance. To this end, we use significance testing: we deem a dataset significant if it appears exceptional when compared to datasets generated from a certain null hypothesis. After detecting a significant pattern in a dataset, it is up to domain experts to interpret the results in the terms of the application.
Resumo:
In a max-min LP, the objective is to maximise subject to Ax 1, Cx 1, and x 0. In a min-max LP, the objective is to minimise subject to Ax 1, Cx 1, and x 0. The matrices A and C are nonnegative and sparse: each row ai of A has at most I positive elements, and each row ck of C has at most K positive elements. We study the approximability of max-min LPs and min-max LPs in a distributed setting; in particular, we focus on local algorithms (constant-time distributed algorithms). We show that for any I 2, K 2, and > 0 there exists a local algorithm that achieves the approximation ratio I (1 1/K) + . We also show that this result is the best possible: no local algorithm can achieve the approximation ratio I (1 1/K) for any I 2 and K 2.