157 resultados para Datavetenskap (datalogi)
Resumo:
Julkisen avaimen menetelmä on ollut käytössä Suomessa jo jonkin aikaa esimerkiksi HST-korteilla, joissa on Väestörekisterikeskuksen myöntämä kansalaisvarmenne. Kansalaisvarmenteella voidaan tehdä mm. laillisesti sitovia digitaalisia allekirjoituksia ja salata sähköpostia. Viime aikoina myös eri matkapuhelinoperaattorit ovat alkaneet tarjota mobiilia kansalaisvarmennetta, jossa salainen avain on sijoitettu matkapuhelimen SIM-kortille. Tässä pro gradu -työssä kuvataan ensin julkisen avaimen menetelmä ja sen käyttöalueet. Lisäksi mainitaan muutamia julkisen avaimen algoritmeja ja julkisen avaimen menetelmiin liittyviä ongelmia. Tämän jälkeen kuvataan ETSI:n MSS-määritys. ETSI:n MSS-määritys kuvaa palvelun rajapinnan, jossa loppukäyttäjältä pyydetään digitaalinen allekirjoitus. Lopuksi kuvataan Valimo Wireless Oy:n toteutus ETSI:n MSS-määrityksestä.
Resumo:
Musiikkitiedonhaku on poikkitieteellinen ala, jonka mielenkiintoisimpia sovelluksia ovat hyräilykyselykoneet. Symbolisen eli nuotteihin perustuvan musiikkitiedonhaun ydinkysymyksiä ovat, milloin sävelkulut muistuttavat jossakin musiikillisessa mielessä toisiaan, ja miten tätä voidaan mallinta ja mitata. Jos MIDI-muotoisesta nuottitietokannasta etsitään sävelkulun likimääräisiä esiintymiä, tarvitaan etäisyysmitta, joka kertoo, millaiset muunnokset säilyttävät sävelkulun oleellisesti samana ja mitkä ovat pieniä tai suuria virheitä. Tässä tutkielmassa perehdytään musiikin tonaaliseen ulottuvuuteen. Tonaalisuuden perustavanlaatuinen ilmenemismuoto länsimaisessa musiikissa ovat sävellajit. Sävellajit voidaan tunnistaa melko luotettavasti dynaamiseen ohjelmointiin perustuvalla algoritmilla, jossa verrataan nuottisegmentissä esiintyvien sävelten joukkoa eri sävellajien mukaisiin odotuksiin. Havaitaan, että harmonisen analyysin kannalta menetelmä on kuitenkin hieman epätarkka, koska siinä ei huomioida harmonisten sointujen vuorovaikutuksen merkitystä. Sävelkulkujen etäisyysvertailujen kannaltaa kriittisin tonaalisuuden ominaisuus on ihmisen sävelkorvan suhteellisuus, mikä tarkoittaa, että sävelten absoluuttiset korkeudet aistitaan keskimäärin melko epätarkasti, mutta sävelten väliset korkeuserot aistitaan tarkasti. Tutkielmassa esitetään sävelasteaakkosto, jonka merkkijonoina yksiääniset sävelkulut voidaan esittää vaihtoehtoisilla tavoilla suhteutettuina, ja pohditaan yleisesti sävelten suhteiden mittaamiseen liittyviä ongelmia sekä yksiäänisten sävelkulkuhakujen että polyfonisten ja homofonisten sävelkudosvertailujen kannalta.
Resumo:
Free and open source software development is an alternative to traditional software engineering as an approach to the development of complex software systems. It is a way of developing software based on geographically distributed teams of volunteers without apparent central plan or traditional mechanisms of coordination. The purpose of this thesis is to summarize the current knowledge about free and open source software development and explore the ways on which further understanding on it could be gained. The results of research on the field as well as the research methods are introduced and discussed. Also adapting software process metrics to the context of free and open source software development is illustrated and the possibilities to utilize them as tools to validate other research are discussed.
Resumo:
Suoraviivainen monilinjauksen toteutus vie eksponentiaalisesti aikaa ja muistia. Siksi on kehitetty monia heuristisia menetelmiä. Tässä paperissa keskityn yhteen heurististen monilinjausmenetelmien luokkaan, progressiiviseen fylogeniapuun mukaan etenevään monilinjaukseen. Menetelmä on kehitetty globaaliin linjaukseen, lokaalin linjaamisen puolelta löytyy joitain samansuuntaisia menetelmiä. Olen soveltanut progressiivisen fylogeniapuun ohjaaman monilinjauksen lokaalisti geenien säätelyalueita etsivään EELohjelmaan. Koska selkeästi hyvää tapaa yleistää EELohjelman käyttämä linjauksen pisteytys ei ole, olen toteuttanut progressiiviseen monilinjaukseeni kolme erilaista pisteytystä: parien summa, parilinjauksen pisteytyksestä yleistetty pisteytys sekä edustavia sekvenssejä käyttävä, puun mukainen pisteytys. Monilinjauksen mahdollistama useamman sekvenssin käyttäminen tarkentaa parilinjauksella löydettyjä säätelyalueita ja vahvistaa säilyvien alueiden merkitystä.
Resumo:
In recent years, XML has been accepted as the format of messages for several applications. Prominent examples include SOAP for Web services, XMPP for instant messaging, and RSS and Atom for content syndication. This XML usage is understandable, as the format itself is a well-accepted standard for structured data, and it has excellent support for many popular programming languages, so inventing an application-specific format no longer seems worth the effort. Simultaneously with this XML's rise to prominence there has been an upsurge in the number and capabilities of various mobile devices. These devices are connected through various wireless technologies to larger networks, and a goal of current research is to integrate them seamlessly into these networks. These two developments seem to be at odds with each other. XML as a fully text-based format takes up more processing power and network bandwidth than binary formats would, whereas the battery-powered nature of mobile devices dictates that energy, both in processing and transmitting, be utilized efficiently. This thesis presents the work we have performed to reconcile these two worlds. We present a message transfer service that we have developed to address what we have identified as the three key issues: XML processing at the application level, a more efficient XML serialization format, and the protocol used to transfer messages. Our presentation includes both a high-level architectural view of the whole message transfer service, as well as detailed descriptions of the three new components. These components consist of an API, and an associated data model, for XML processing designed for messaging applications, a binary serialization format for the data model of the API, and a message transfer protocol providing two-way messaging capability with support for client mobility. We also present relevant performance measurements for the service and its components. As a result of this work, we do not consider XML to be inherently incompatible with mobile devices. As the fixed networking world moves toward XML for interoperable data representation, so should the wireless world also do to provide a better-integrated networking infrastructure. However, the problems that XML adoption has touch all of the higher layers of application programming, so instead of concentrating simply on the serialization format we conclude that improvements need to be made in an integrated fashion in all of these layers.
Resumo:
With the proliferation of wireless and mobile devices equipped with multiple radio interfaces to connect to the Internet, vertical handoff involving different wireless access technologies will enable users to get the best of connectivity and service quality during the lifetime of a TCP connection. A vertical handoff may introduce an abrupt, significant change in the access link characteristics and as a result the end-to-end path characteristics such as the bandwidth and the round-trip time (RTT) of a TCP connection may change considerably. TCP may take several RTTs to adapt to these changes in path characteristics and during this interval there may be packet losses and / or inefficient utilization of the available bandwidth. In this thesis we study the behaviour and performance of TCP in the presence of a vertical handoff. We identify the different handoff scenarios that adversely affect TCP performance. We propose several enhancements to the TCP sender algorithm that are specific to the different handoff scenarios to adapt TCP better to a vertical handoff. Our algorithms are conservative in nature and make use of cross-layer information obtained from the lower layers regarding the characteristics of the access links involved in a handoff. We evaluate the proposed algorithms by extensive simulation of the various handoff scenarios involving access links with a wide range of bandwidth and delay. We show that the proposed algorithms are effective in improving the TCP behaviour in various handoff scenarios and do not adversely affect the performance of TCP in the absence of cross-layer information.
Resumo:
In this thesis a manifold learning method is applied to the problem of WLAN positioning and automatic radio map creation. Due to the nature of WLAN signal strength measurements, a signal map created from raw measurements results in non-linear distance relations between measurement points. These signal strength vectors reside in a high-dimensioned coordinate system. With the help of the so called Isomap-algorithm the dimensionality of this map can be reduced, and thus more easily processed. By embedding position-labeled strategic key points, we can automatically adjust the mapping to match the surveyed environment. The environment is thus learned in a semi-supervised way; gathering training points and embedding them in a two-dimensional manifold gives us a rough mapping of the measured environment. After a calibration phase, where the labeled key points in the training data are used to associate coordinates in the manifold representation with geographical locations, we can perform positioning using the adjusted map. This can be achieved through a traditional supervised learning process, which in our case is a simple nearest neighbors matching of a sampled signal strength vector. We deployed this system in two locations in the Kumpula campus in Helsinki, Finland. Results indicate that positioning based on the learned radio map can achieve good accuracy, especially in hallways or other areas in the environment where the WLAN signal is constrained by obstacles such as walls.
Resumo:
Perimän eri kohdissa sijaitsevat genotyypit ovat assosioituneita, jos niiden välillä on tilastollinen riippuvuus. Tässä tutkielmassa esitellään ja vertaillaan menetelmiä kromosomien välisten genotyyppiassosiaatioiden etsintään. Saatavilla olevista genotyyppiaineistoista voidaan muodostaa miljardeja kromosomien välisiä ehdokkaita mahdollisesti assosioituneiksi genotyyppipareiksi. Etsintätehtävä voidaan jakaa kolmeen erilliseen osaan: assosiaation voimakkuutta kuvaavan tunnusluvun valinta, tuloksen merkitsevyyden laskeminen sekä tarpeeksi merkitsevien tulosten valinta. Tunnusluvun valintaan ja merkitsevyyden laskemiseen liittyen tutkielmassa esitellään pari alleeliassosiaation mittaamiseen tarkoitettua perinteistä alleeliassosiaatiomittaa sekä yleisempiä riippumattomuustestejä kuten khii-toiseen-testi, G-testi ja erilaisia satunnaiseen näytteenottoon perustuvia testaustapoja. Lisäksi ehdotetaan kahta menetelmää tarkkaan merkitsevyyden laskemiseen: genotyyppikohtaista tarkkaa testiä ja maksimipoikkeamatestiä. Merkitsevien tulosten valintaan liittyen tutustutaan koekohtaista virhetodennäköisyyttä rajoittavaan Bonferroni-korjaukseen, hylkäysvirheastetta rajoittavaan FDR-kontrollointiin sekä näiden muunnelmiin. Lopuksi kokeillaan muutamaa esiteltyä menetelmää sekä keinotekoisesti tuotetulla että aidolla genotyyppiaineistolla ja analysoidaan löydettyjä assosiaatioita. Koetuloksista on havaittavissa joukko vahvasti merkitseviä assosiaatioita kromosomien välillä. Osa näistä on selitettävissä populaation sisäisillä osapopulaatioilla, ja muutamat näyttäisivät olevan seurausta aineistossa väärin sijoitelluista markkereista. Suuri osa riippuvuuksista aiheutuu kolmesta sukupuolen kanssa vahvasti assosioituneesta perimän kohdasta. Näiden lisäksi jäljelle jää joukko assosiaatioita, joiden syyt ovat tuntemattomia.
Resumo:
Automatisk språkprocessering har efter mer än ett halvt sekel av forskning blivit ett mycket viktigt område inom datavetenskapen. Flera vetenskapligt viktiga problem har lösts och praktiska applikationer har nått programvarumarknaden. Disambiguering av ord innebär att hitta rätt betydelse för ett mångtydigt ord. Sammanhanget, de omkringliggande orden och kunskap om ämnesområdet är faktorer som kan användas för att disambiguera ett ord. Automatisk sammanfattning innebär att förkorta en text utan att den relevanta informationen går förlorad. Relevanta meningar kan plockas ur texten, eller så kan en ny, kortare text genereras på basen av fakta i den ursprungliga texten. Avhandlingen ger en allmän översikt och kort historik av språkprocesseringen och jämför några metoder för disambiguering av ord och automatisk sammanfattning. Problemområdenas likheter och skillnader lyfts fram och metodernas ställning inom datavetenskapen belyses.
Resumo:
The core aim of machine learning is to make a computer program learn from the experience. Learning from data is usually defined as a task of learning regularities or patterns in data in order to extract useful information, or to learn the underlying concept. An important sub-field of machine learning is called multi-view learning where the task is to learn from multiple data sets or views describing the same underlying concept. A typical example of such scenario would be to study a biological concept using several biological measurements like gene expression, protein expression and metabolic profiles, or to classify web pages based on their content and the contents of their hyperlinks. In this thesis, novel problem formulations and methods for multi-view learning are presented. The contributions include a linear data fusion approach during exploratory data analysis, a new measure to evaluate different kinds of representations for textual data, and an extension of multi-view learning for novel scenarios where the correspondence of samples in the different views or data sets is not known in advance. In order to infer the one-to-one correspondence of samples between two views, a novel concept of multi-view matching is proposed. The matching algorithm is completely data-driven and is demonstrated in several applications such as matching of metabolites between humans and mice, and matching of sentences between documents in two languages.
Resumo:
As the virtual world grows more complex, finding a standard way for storing data becomes increasingly important. Ideally, each data item would be brought into the computer system only once. References for data items need to be cryptographically verifiable, so the data can maintain its identity while being passed around. This way there will be only one copy of the users family photo album, while the user can use multiple tools to show or manipulate the album. Copies of users data could be stored on some of his family members computer, some of his computers, but also at some online services which he uses. When all actors operate over one replicated copy of the data, the system automatically avoids a single point of failure. Thus the data will not disappear with one computer breaking, or one service provider going out of business. One shared copy also makes it possible to delete a piece of data from all systems at once, on users request. In our research we tried to find a model that would make data manageable to users, and make it possible to have the same data stored at various locations. We studied three systems, Persona, Freenet, and GNUnet, that suggest different models for protecting user data. The main application areas of the systems studied include securing online social networks, providing anonymous web, and preventing censorship in file-sharing. Each of the systems studied store user data on machines belonging to third parties. The systems differ in measures they take to protect their users from data loss, forged information, censorship, and being monitored. All of the systems use cryptography to secure names used for the content, and to protect the data from outsiders. Based on the gained knowledge, we built a prototype platform called Peerscape, which stores user data in a synchronized, protected database. Data items themselves are protected with cryptography against forgery, but not encrypted as the focus has been disseminating the data directly among family and friends instead of letting third parties store the information. We turned the synchronizing database into peer-to-peer web by revealing its contents through an integrated http server. The REST-like http API supports development of applications in javascript. To evaluate the platform s suitability for application development we wrote some simple applications, including a public chat room, bittorrent site, and a flower growing game. During our early tests we came to the conclusion that using the platform for simple applications works well. As web standards develop further, writing applications for the platform should become easier. Any system this complex will have its problems, and we are not expecting our platform to replace the existing web, but are fairly impressed with the results and consider our work important from the perspective of managing user data.
Resumo:
A distributed system is a collection of networked autonomous processing units which must work in a cooperative manner. Currently, large-scale distributed systems, such as various telecommunication and computer networks, are abundant and used in a multitude of tasks. The field of distributed computing studies what can be computed efficiently in such systems. Distributed systems are usually modelled as graphs where nodes represent the processors and edges denote communication links between processors. This thesis concentrates on the computational complexity of the distributed graph colouring problem. The objective of the graph colouring problem is to assign a colour to each node in such a way that no two nodes connected by an edge share the same colour. In particular, it is often desirable to use only a small number of colours. This task is a fundamental symmetry-breaking primitive in various distributed algorithms. A graph that has been coloured in this manner using at most k different colours is said to be k-coloured. This work examines the synchronous message-passing model of distributed computation: every node runs the same algorithm, and the system operates in discrete synchronous communication rounds. During each round, a node can communicate with its neighbours and perform local computation. In this model, the time complexity of a problem is the number of synchronous communication rounds required to solve the problem. It is known that 3-colouring any k-coloured directed cycle requires at least ½(log* k - 3) communication rounds and is possible in ½(log* k + 7) communication rounds for all k ≥ 3. This work shows that for any k ≥ 3, colouring a k-coloured directed cycle with at most three colours is possible in ½(log* k + 3) rounds. In contrast, it is also shown that for some values of k, colouring a directed cycle with at most three colours requires at least ½(log* k + 1) communication rounds. Furthermore, in the case of directed rooted trees, reducing a k-colouring into a 3-colouring requires at least log* k + 1 rounds for some k and possible in log* k + 3 rounds for all k ≥ 3. The new positive and negative results are derived using computational methods, as the existence of distributed colouring algorithms corresponds to the colourability of so-called neighbourhood graphs. The colourability of these graphs is analysed using Boolean satisfiability (SAT) solvers. Finally, this thesis shows that similar methods are applicable in capturing the existence of distributed algorithms for other graph problems, such as the maximal matching problem.