37 resultados para DATA-STORAGE APPLICATIONS
em Helda - Digital Repository of University of Helsinki
Resumo:
This thesis presents novel modelling applications for environmental geospatial data using remote sensing, GIS and statistical modelling techniques. The studied themes can be classified into four main themes: (i) to develop advanced geospatial databases. Paper (I) demonstrates the creation of a geospatial database for the Glanville fritillary butterfly (Melitaea cinxia) in the Åland Islands, south-western Finland; (ii) to analyse species diversity and distribution using GIS techniques. Paper (II) presents a diversity and geographical distribution analysis for Scopulini moths at a world-wide scale; (iii) to study spatiotemporal forest cover change. Paper (III) presents a study of exotic and indigenous tree cover change detection in Taita Hills Kenya using airborne imagery and GIS analysis techniques; (iv) to explore predictive modelling techniques using geospatial data. In Paper (IV) human population occurrence and abundance in the Taita Hills highlands was predicted using the generalized additive modelling (GAM) technique. Paper (V) presents techniques to enhance fire prediction and burned area estimation at a regional scale in East Caprivi Namibia. Paper (VI) compares eight state-of-the-art predictive modelling methods to improve fire prediction, burned area estimation and fire risk mapping in East Caprivi Namibia. The results in Paper (I) showed that geospatial data can be managed effectively using advanced relational database management systems. Metapopulation data for Melitaea cinxia butterfly was successfully combined with GPS-delimited habitat patch information and climatic data. Using the geospatial database, spatial analyses were successfully conducted at habitat patch level or at more coarse analysis scales. Moreover, this study showed it appears evident that at a large-scale spatially correlated weather conditions are one of the primary causes of spatially correlated changes in Melitaea cinxia population sizes. In Paper (II) spatiotemporal characteristics of Socupulini moths description, diversity and distribution were analysed at a world-wide scale and for the first time GIS techniques were used for Scopulini moth geographical distribution analysis. This study revealed that Scopulini moths have a cosmopolitan distribution. The majority of the species have been described from the low latitudes, sub-Saharan Africa being the hot spot of species diversity. However, the taxonomical effort has been uneven among biogeographical regions. Paper III showed that forest cover change can be analysed in great detail using modern airborne imagery techniques and historical aerial photographs. However, when spatiotemporal forest cover change is studied care has to be taken in co-registration and image interpretation when historical black and white aerial photography is used. In Paper (IV) human population distribution and abundance could be modelled with fairly good results using geospatial predictors and non-Gaussian predictive modelling techniques. Moreover, land cover layer is not necessary needed as a predictor because first and second-order image texture measurements derived from satellite imagery had more power to explain the variation in dwelling unit occurrence and abundance. Paper V showed that generalized linear model (GLM) is a suitable technique for fire occurrence prediction and for burned area estimation. GLM based burned area estimations were found to be more superior than the existing MODIS burned area product (MCD45A1). However, spatial autocorrelation of fires has to be taken into account when using the GLM technique for fire occurrence prediction. Paper VI showed that novel statistical predictive modelling techniques can be used to improve fire prediction, burned area estimation and fire risk mapping at a regional scale. However, some noticeable variation between different predictive modelling techniques for fire occurrence prediction and burned area estimation existed.
Resumo:
Delay and disruption tolerant networks (DTNs) are computer networks where round trip delays and error rates are high and disconnections frequent. Examples of these extreme networks are space communications, sensor networks, connecting rural villages to the Internet and even interconnecting commodity portable wireless devices and mobile phones. Basic elements of delay tolerant networks are a store-and-forward message transfer resembling traditional mail delivery, an opportunistic and intermittent routing, and an extensible cross-region resource naming service. Individual nodes of the network take an active part in routing the traffic and provide in-network data storage for application data that flows through the network. Application architecture for delay tolerant networks differs also from those used in traditional networks. It has become feasible to design applications that are network-aware and opportunistic, taking an advantage of different network connection speeds and capabilities. This might change some of the basic paradigms of network application design. DTN protocols will also support in designing applications which depend on processes to be persistent over reboots and power failures. DTN protocols could also be applicable to traditional networks in cases where high tolerance to delays or errors would be desired. It is apparent that challenged networks also challenge the traditional strictly layered model of network application design. This thesis provides an extensive introduction to delay tolerant networking concepts and applications. Most attention is given to challenging problems of routing and application architecture. Finally, future prospects of DTN applications and implementations are envisioned through recent research results and an interview with an active researcher of DTN networks.
Resumo:
Place identification refers to the process of analyzing sensor data in order to detect places, i.e., spatial areas that are linked with activities and associated with meanings. Place information can be used, e.g., to provide awareness cues in applications that support social interactions, to provide personalized and location-sensitive information to the user, and to support mobile user studies by providing cues about the situations the study participant has encountered. Regularities in human movement patterns make it possible to detect personally meaningful places by analyzing location traces of a user. This thesis focuses on providing system level support for place identification, as well as on algorithmic issues related to the place identification process. The move from location to place requires interactions between location sensing technologies (e.g., GPS or GSM positioning), algorithms that identify places from location data and applications and services that utilize place information. These interactions can be facilitated using a mobile platform, i.e., an application or framework that runs on a mobile phone. For the purposes of this thesis, mobile platforms automate data capture and processing and provide means for disseminating data to applications and other system components. The first contribution of the thesis is BeTelGeuse, a freely available, open source mobile platform that supports multiple runtime environments. The actual place identification process can be understood as a data analysis task where the goal is to analyze (location) measurements and to identify areas that are meaningful to the user. The second contribution of the thesis is the Dirichlet Process Clustering (DPCluster) algorithm, a novel place identification algorithm. The performance of the DPCluster algorithm is evaluated using twelve different datasets that have been collected by different users, at different locations and over different periods of time. As part of the evaluation we compare the DPCluster algorithm against other state-of-the-art place identification algorithms. The results indicate that the DPCluster algorithm provides improved generalization performance against spatial and temporal variations in location measurements.
Resumo:
Segmentation is a data mining technique yielding simplified representations of sequences of ordered points. A sequence is divided into some number of homogeneous blocks, and all points within a segment are described by a single value. The focus in this thesis is on piecewise-constant segments, where the most likely description for each segment and the most likely segmentation into some number of blocks can be computed efficiently. Representing sequences as segmentations is useful in, e.g., storage and indexing tasks in sequence databases, and segmentation can be used as a tool in learning about the structure of a given sequence. The discussion in this thesis begins with basic questions related to segmentation analysis, such as choosing the number of segments, and evaluating the obtained segmentations. Standard model selection techniques are shown to perform well for the sequence segmentation task. Segmentation evaluation is proposed with respect to a known segmentation structure. Applying segmentation on certain features of a sequence is shown to yield segmentations that are significantly close to the known underlying structure. Two extensions to the basic segmentation framework are introduced: unimodal segmentation and basis segmentation. The former is concerned with segmentations where the segment descriptions first increase and then decrease, and the latter with the interplay between different dimensions and segments in the sequence. These problems are formally defined and algorithms for solving them are provided and analyzed. Practical applications for segmentation techniques include time series and data stream analysis, text analysis, and biological sequence analysis. In this thesis segmentation applications are demonstrated in analyzing genomic sequences.
Resumo:
In order to improve and continuously develop the quality of pharmaceutical products, the process analytical technology (PAT) framework has been adopted by the US Food and Drug Administration. One of the aims of PAT is to identify critical process parameters and their effect on the quality of the final product. Real time analysis of the process data enables better control of the processes to obtain a high quality product. The main purpose of this work was to monitor crucial pharmaceutical unit operations (from blending to coating) and to examine the effect of processing on solid-state transformations and physical properties. The tools used were near-infrared (NIR) and Raman spectroscopy combined with multivariate data analysis, as well as X-ray powder diffraction (XRPD) and terahertz pulsed imaging (TPI). To detect process-induced transformations in active pharmaceutical ingredients (APIs), samples were taken after blending, granulation, extrusion, spheronisation, and drying. These samples were monitored by XRPD, Raman, and NIR spectroscopy showing hydrate formation in the case of theophylline and nitrofurantoin. For erythromycin dihydrate formation of the isomorphic dehydrate was critical. Thus, the main focus was on the drying process. NIR spectroscopy was applied in-line during a fluid-bed drying process. Multivariate data analysis (principal component analysis) enabled detection of the dehydrate formation at temperatures above 45°C. Furthermore, a small-scale rotating plate device was tested to provide an insight into film coating. The process was monitored using NIR spectroscopy. A calibration model, using partial least squares regression, was set up and applied to data obtained by in-line NIR measurements of a coating drum process. The predicted coating thickness agreed with the measured coating thickness. For investigating the quality of film coatings TPI was used to create a 3-D image of a coated tablet. With this technique it was possible to determine coating layer thickness, distribution, reproducibility, and uniformity. In addition, it was possible to localise defects of either the coating or the tablet. It can be concluded from this work that the applied techniques increased the understanding of physico-chemical properties of drugs and drug products during and after processing. They additionally provided useful information to improve and verify the quality of pharmaceutical dosage forms
Resumo:
During the last decades there has been a global shift in forest management from a focus solely on timber management to ecosystem management that endorses all aspects of forest functions: ecological, economic and social. This has resulted in a shift in paradigm from sustained yield to sustained diversity of values, goods and benefits obtained at the same time, introducing new temporal and spatial scales into forest resource management. The purpose of the present dissertation was to develop methods that would enable spatial and temporal scales to be introduced into the storage, processing, access and utilization of forest resource data. The methods developed are based on a conceptual view of a forest as a hierarchically nested collection of objects that can have a dynamically changing set of attributes. The temporal aspect of the methods consists of lifetime management for the objects and their attributes and of a temporal succession linking the objects together. Development of the forest resource data processing method concentrated on the extensibility and configurability of the data content and model calculations, allowing for a diverse set of processing operations to be executed using the same framework. The contribution of this dissertation to the utilisation of multi-scale forest resource data lies in the development of a reference data generation method to support forest inventory methods in approaching single-tree resolution.
Resumo:
This work develops methods to account for shoot structure in models of coniferous canopy radiative transfer. Shoot structure, as it varies along the light gradient inside canopy, affects the efficiency of light interception per unit needle area, foliage biomass, or foliage nitrogen. The clumping of needles in the shoot volume also causes a notable amount of multiple scattering of light within coniferous shoots. The effect of shoot structure on light interception is treated in the context of canopy level photosynthesis and resource use models, and the phenomenon of within-shoot multiple scattering in the context of physical canopy reflectance models for remote sensing purposes. Light interception. A method for estimating the amount of PAR (Photosynthetically Active Radiation) intercepted by a conifer shoot is presented. The method combines modelling of the directional distribution of radiation above canopy, fish-eye photographs taken at shoot locations to measure canopy gap fraction, and geometrical measurements of shoot orientation and structure. Data on light availability, shoot and needle structure and nitrogen content has been collected from canopies of Pacific silver fir (Abies amabilis (Dougl.) Forbes) and Norway spruce (Picea abies (L.) Karst.). Shoot structure acclimated to light gradient inside canopy so that more shaded shoots have better light interception efficiency. Light interception efficiency of shoots varied about two-fold per needle area, about four-fold per needle dry mass, and about five-fold per nitrogen content. Comparison of fertilized and control stands of Norway spruce indicated that light interception efficiency is not greatly affected by fertilization. Light scattering. Structure of coniferous shoots gives rise to multiple scattering of light between the needles of the shoot. Using geometric models of shoots, multiple scattering was studied by photon tracing simulations. Based on simulation results, the dependence of the scattering coefficient of shoot from the scattering coefficient of needles is shown to follow a simple one-parameter model. The single parameter, termed the recollision probability, describes the level of clumping of the needles in the shoot, is wavelength independent, and can be connected to previously used clumping indices. By using the recollision probability to correct for the within-shoot multiple scattering, canopy radiative transfer models which have used leaves as basic elements can use shoots as basic elements, and thus be applied for coniferous forests. Preliminary testing of this approach seems to explain, at least partially, why coniferous forests appear darker than broadleaved forests in satellite data.
Resumo:
This thesis addresses modeling of financial time series, especially stock market returns and daily price ranges. Modeling data of this kind can be approached with so-called multiplicative error models (MEM). These models nest several well known time series models such as GARCH, ACD and CARR models. They are able to capture many well established features of financial time series including volatility clustering and leptokurtosis. In contrast to these phenomena, different kinds of asymmetries have received relatively little attention in the existing literature. In this thesis asymmetries arise from various sources. They are observed in both conditional and unconditional distributions, for variables with non-negative values and for variables that have values on the real line. In the multivariate context asymmetries can be observed in the marginal distributions as well as in the relationships of the variables modeled. New methods for all these cases are proposed. Chapter 2 considers GARCH models and modeling of returns of two stock market indices. The chapter introduces the so-called generalized hyperbolic (GH) GARCH model to account for asymmetries in both conditional and unconditional distribution. In particular, two special cases of the GARCH-GH model which describe the data most accurately are proposed. They are found to improve the fit of the model when compared to symmetric GARCH models. The advantages of accounting for asymmetries are also observed through Value-at-Risk applications. Both theoretical and empirical contributions are provided in Chapter 3 of the thesis. In this chapter the so-called mixture conditional autoregressive range (MCARR) model is introduced, examined and applied to daily price ranges of the Hang Seng Index. The conditions for the strict and weak stationarity of the model as well as an expression for the autocorrelation function are obtained by writing the MCARR model as a first order autoregressive process with random coefficients. The chapter also introduces inverse gamma (IG) distribution to CARR models. The advantages of CARR-IG and MCARR-IG specifications over conventional CARR models are found in the empirical application both in- and out-of-sample. Chapter 4 discusses the simultaneous modeling of absolute returns and daily price ranges. In this part of the thesis a vector multiplicative error model (VMEM) with asymmetric Gumbel copula is found to provide substantial benefits over the existing VMEM models based on elliptical copulas. The proposed specification is able to capture the highly asymmetric dependence of the modeled variables thereby improving the performance of the model considerably. The economic significance of the results obtained is established when the information content of the volatility forecasts derived is examined.
Resumo:
Sensor networks represent an attractive tool to observe the physical world. Networks of tiny sensors can be used to detect a fire in a forest, to monitor the level of pollution in a river, or to check on the structural integrity of a bridge. Application-specific deployments of static-sensor networks have been widely investigated. Commonly, these networks involve a centralized data-collection point and no sharing of data outside the organization that owns it. Although this approach can accommodate many application scenarios, it significantly deviates from the pervasive computing vision of ubiquitous sensing where user applications seamlessly access anytime, anywhere data produced by sensors embedded in the surroundings. With the ubiquity and ever-increasing capabilities of mobile devices, urban environments can help give substance to the ubiquitous sensing vision through Urbanets, spontaneously created urban networks. Urbanets consist of mobile multi-sensor devices, such as smart phones and vehicular systems, public sensor networks deployed by municipalities, and individual sensors incorporated in buildings, roads, or daily artifacts. My thesis is that "multi-sensor mobile devices can be successfully programmed to become the underpinning elements of an open, infrastructure-less, distributed sensing platform that can bring sensor data out of their traditional close-loop networks into everyday urban applications". Urbanets can support a variety of services ranging from emergency and surveillance to tourist guidance and entertainment. For instance, cars can be used to provide traffic information services to alert drivers to upcoming traffic jams, and phones to provide shopping recommender services to inform users of special offers at the mall. Urbanets cannot be programmed using traditional distributed computing models, which assume underlying networks with functionally homogeneous nodes, stable configurations, and known delays. Conversely, Urbanets have functionally heterogeneous nodes, volatile configurations, and unknown delays. Instead, solutions developed for sensor networks and mobile ad hoc networks can be leveraged to provide novel architectures that address Urbanet-specific requirements, while providing useful abstractions that hide the network complexity from the programmer. This dissertation presents two middleware architectures that can support mobile sensing applications in Urbanets. Contory offers a declarative programming model that views Urbanets as a distributed sensor database and exposes an SQL-like interface to developers. Context-aware Migratory Services provides a client-server paradigm, where services are capable of migrating to different nodes in the network in order to maintain a continuous and semantically correct interaction with clients. Compared to previous approaches to supporting mobile sensing urban applications, our architectures are entirely distributed and do not assume constant availability of Internet connectivity. In addition, they allow on-demand collection of sensor data with the accuracy and at the frequency required by every application. These architectures have been implemented in Java and tested on smart phones. They have proved successful in supporting several prototype applications and experimental results obtained in ad hoc networks of phones have demonstrated their feasibility with reasonable performance in terms of latency, memory, and energy consumption.
Resumo:
The Minimum Description Length (MDL) principle is a general, well-founded theoretical formalization of statistical modeling. The most important notion of MDL is the stochastic complexity, which can be interpreted as the shortest description length of a given sample of data relative to a model class. The exact definition of the stochastic complexity has gone through several evolutionary steps. The latest instantation is based on the so-called Normalized Maximum Likelihood (NML) distribution which has been shown to possess several important theoretical properties. However, the applications of this modern version of the MDL have been quite rare because of computational complexity problems, i.e., for discrete data, the definition of NML involves an exponential sum, and in the case of continuous data, a multi-dimensional integral usually infeasible to evaluate or even approximate accurately. In this doctoral dissertation, we present mathematical techniques for computing NML efficiently for some model families involving discrete data. We also show how these techniques can be used to apply MDL in two practical applications: histogram density estimation and clustering of multi-dimensional data.
Resumo:
Cell transition data is obtained from a cellular phone that switches its current serving cell tower. The data consists of a sequence of transition events, which are pairs of cell identifiers and transition times. The focus of this thesis is applying data mining methods to such data, developing new algorithms, and extracting knowledge that will be a solid foundation on which to build location-aware applications. In addition to a thorough exploration of the features of the data, the tools and methods developed in this thesis provide solutions to three distinct research problems. First, we develop clustering algorithms that produce a reliable mapping between cell transitions and physical locations observed by users of mobile devices. The main clustering algorithm operates in online fashion, and we consider also a number of offline clustering methods for comparison. Second, we define the concept of significant locations, known as bases, and give an online algorithm for determining them. Finally, we consider the task of predicting the movement of the user, based on historical data. We develop a prediction algorithm that considers paths of movement in their entirety, instead of just the most recent movement history. All of the presented methods are evaluated with a significant body of real cell transition data, collected from about one hundred different individuals. The algorithms developed in this thesis are designed to be implemented on a mobile device, and require no extra hardware sensors or network infrastructure. By not relying on external services and keeping the user information as much as possible on the user s own personal device, we avoid privacy issues and let the users control the disclosure of their location information.
Resumo:
Current smartphones have a storage capacity of several gigabytes. More and more information is stored on mobile devices. To meet the challenge of information organization, we turn to desktop search. Users often possess multiple devices, and synchronize (subsets of) information between them. This makes file synchronization more important. This thesis presents Dessy, a desktop search and synchronization framework for mobile devices. Dessy uses desktop search techniques, such as indexing, query and index term stemming, and search relevance ranking. Dessy finds files by their content, metadata, and context information. For example, PDF files may be found by their author, subject, title, or text. EXIF data of JPEG files may be used in finding them. User–defined tags can be added to files to organize and retrieve them later. Retrieved files are ranked according to their relevance to the search query. The Dessy prototype uses the BM25 ranking function, used widely in information retrieval. Dessy provides an interface for locating files for both users and applications. Dessy is closely integrated with the Syxaw file synchronizer, which provides efficient file and metadata synchronization, optimizing network usage. Dessy supports synchronization of search results, individual files, and directory trees. It allows finding and synchronizing files that reside on remote computers, or the Internet. Dessy is designed to solve the problem of efficient mobile desktop search and synchronization, also supporting remote and Internet search. Remote searches may be carried out offline using a downloaded index, or while connected to the remote machine on a weak network. To secure user data, transmissions between the Dessy client and server are encrypted using symmetric encryption. Symmetric encryption keys are exchanged with RSA key exchange. Dessy emphasizes extensibility. Also the cryptography can be extended. Users may tag their files with context tags and control custom file metadata. Adding new indexed file types, metadata fields, ranking methods, and index types is easy. Finding files is done with virtual directories, which are views into the user’s files, browseable by regular file managers. On mobile devices, the Dessy GUI provides easy access to the search and synchronization system. This thesis includes results of Dessy synchronization and search experiments, including power usage measurements. Finally, Dessy has been designed with mobility and device constraints in mind. It requires only MIDP 2.0 Mobile Java with FileConnection support, and Java 1.5 on desktop machines.
Resumo:
The tackling of coastal eutrophication requires water protection measures based on status assessments of water quality. The main purpose of this thesis was to evaluate whether it is possible both scientifically and within the terms of the European Union Water Framework Directive (WFD) to assess the status of coastal marine waters reliably by using phytoplankton biomass (ww) and chlorophyll a (Chl) as indicators of eutrophication in Finnish coastal waters. Empirical approaches were used to study whether the criteria, established for determining an indicator, are fulfilled. The first criterion (i) was that an indicator should respond to anthropogenic stresses in a predictable manner and has low variability in its response. Summertime Chl could be predicted accurately by nutrient concentrations, but not from the external annual loads alone, because of the rapid affect of primary production and sedimentation close to the loading sources in summer. The most accurate predictions were achieved in the Archipelago Sea, where total phosphorus (TP) and total nitrogen (TN) alone accounted for 87% and 78% of the variation in Chl, respectively. In river estuaries, the TP mass-balance regression model predicted Chl most accurately when nutrients originated from point-sources, whereas land-use regression models were most accurate in cases when nutrients originated mainly from diffuse sources. The inclusion of morphometry (e.g. mean depth) into nutrient models improved accuracy of the predictions. The second criterion (ii) was associated with the WFD. It requires that an indicator should have type-specific reference conditions, which are defined as "conditions where the values of the biological quality elements are at high ecological status". In establishing reference conditions, the empirical approach could only be used in the outer coastal water types, where historical observations of Secchi depth of the early 1900s are available. The most accurate prediction was achieved in the Quark. In the inner coastal water types, reference Chl, estimated from present monitoring data, are imprecise - not only because of the less accurate estimation method but also because the intrinsic characteristics, described for instance by morphometry, vary considerably inside these extensive inner coastal types. As for phytoplankton biomass, the reference values were less accurate than in the case of Chl, because it was possible to estimate reference conditions for biomass only by using the reconstructed Chl values, not the historical Secchi observations. An paleoecological approach was also applied to estimate annual average reference conditions for Chl. In Laajalahti, an urban embayment off Helsinki, strongly loaded by municipal waste waters in the 1960s and 1970s, reference conditions prevailed in the mid- and late 1800s. The recovery of the bay from pollution has been delayed as a consequence of benthic release of nutrients. Laajalahti will probably not achieve the good quality objectives of the WFD on time. The third criterion (iii) was associated with coastal management including the resources it has available. Analyses of Chl are cheap and fast to carry out compared to the analyses of phytoplankton biomass and species composition; the fact which has an effect on number of samples to be taken and thereby on the reliability of assessments. However, analyses on phytoplankton biomass and species composition provide more metrics for ecological classification, the metrics which reveal various aspects of eutrophication contrary to what Chl alone does.
Resumo:
Transposons are mobile elements of genetic material that are able to move in the genomes of their host organisms using a special form of recombination called transposition. Bacteriophage Mu was the first transposon for which a cell-free in vitro transposition reaction was developed. Subsequently, the reaction has been refined and the minimal Mu in vitro reaction is useful in the generation of comprehensive libraries of mutant DNA molecules that can be used in a variety of applications. To date, the functional genetics applications of Mu in vitro technology have been subjected to either plasmids or genomic regions and entire genomes of viruses cloned on specific vectors. This study expands the use of Mu in vitro transposition in functional genetics and genomics by describing novel methods applicable to the targeted transgenesis of mouse and the whole-genome analysis of bacteriophages. The methods described here are rapid, efficient, and easily applicable to a wide variety of organisms, demonstrating the potential of the Mu transposition technology in the functional analysis of genes and genomes. First, an easy-to-use, rapid strategy to generate construct for the targeted mutagenesis of mouse genes was developed. To test the strategy, a gene encoding a neuronal K+/Cl- cotransporter was mutagenised. After a highly efficient transpositional mutagenesis, the gene fragments mutagenised were cloned into a vector backbone and transferred into bacterial cells. These constructs were screened with PCR using an effective 3D matrix system. In addition to traditional knock-out constructs, the method developed yields hypomorphic alleles that lead into reduced expression of the target gene in transgenic mice and have since been used in a follow-up study. Moreover, a scheme is devised to rapidly produce conditional alleles from the constructs produced. Next, an efficient strategy for the whole-genome analysis of bacteriophages was developed based on the transpositional mutagenesis of uncloned, infective virus genomes and their subsequent transfer into susceptible host cells. Mutant viruses able to produce viable progeny were collected and their transposon integration sites determined to map genomic regions nonessential to the viral life cycle. This method, applied here to three very different bacteriophages, PRD1, ΦYeO3 12, and PM2, does not require the target genome to be cloned and is directly applicable to all DNA and RNA viruses that have infective genomes. The method developed yielded valuable novel information on the three bacteriophages studied and whole-genome data can be complemented with concomitant studies on individual genes. Moreover, end-modified transposons constructed for this study can be used to manipulate genomes devoid of suitable restriction sites.