11 resultados para LZ77 compressione algoritmi CPS1 CPS2 fattorizzazione decodifica
em Helda - Digital Repository of University of Helsinki
Resumo:
This thesis integrates real-time feedback control into an optical tweezers instrument. The goal is to reduce the variance in the trapped bead s position, -effectively increasing the trap stiffness of the optical tweezers. Trap steering is done with acousto-optic deflectors and control algorithms are implemented with a field-programmable gate array card. When position clamp feedback control is on, the effective trap stiffness increases 12.1-times compared to the stiffness without control. This allows improved spatial control over trapped particles without increasing the trapping laser power.
Resumo:
Determination of testosterone and related compounds in body fluids is of utmost importance in doping control and the diagnosis of many diseases. Capillary electromigration techniques are a relatively new approach for steroid research. Owing to their electrical neutrality, however, separation of steroids by capillary electromigration techniques requires the use of charged electrolyte additives that interact with the steroids either specifically or non-specifically. The analysis of testosterone and related steroids by non-specific micellar electrokinetic chromatography (MEKC) was investigated in this study. The partial filling (PF) technique was employed, being suitable for detection by both ultraviolet spectrophotometry (UV) and electrospray ionization mass spectrometry (ESI-MS). Efficient, quantitative PF-MEKC UV methods for steroid standards were developed through the use of optimized pseudostationary phases comprising surfactants and cyclodextrins. PF-MEKC UV proved to be a more sensitive, efficient and repeatable method for the steroids than PF-MEKC ESI-MS. It was discovered that in PF-MEKC analyses of electrically neutral steroids, ESI-MS interfacing sets significant limitations not only on the chemistry affecting the ionization and detection processes, but also on the separation. The new PF-MEKC UV method was successfully employed in the determination of testosterone in male urine samples after microscale immunoaffinity solid-phase extraction (IA-SPE). The IA-SPE method, relying on specific interactions between testosterone and a recombinant anti-testosterone Fab fragment, is the first such method described for testosterone. Finally, new data for interactions between steroids and human and bovine serum albumins were obtained through the use of affinity capillary electrophoresis. A new algorithm for the calculation of association constants between proteins and neutral ligands is introduced.
Resumo:
The focus of this study is on statistical analysis of categorical responses, where the response values are dependent of each other. The most typical example of this kind of dependence is when repeated responses have been obtained from the same study unit. For example, in Paper I, the response of interest is the pneumococcal nasopharengyal carriage (yes/no) on 329 children. For each child, the carriage is measured nine times during the first 18 months of life, and thus repeated respones on each child cannot be assumed independent of each other. In the case of the above example, the interest typically lies in the carriage prevalence, and whether different risk factors affect the prevalence. Regression analysis is the established method for studying the effects of risk factors. In order to make correct inferences from the regression model, the associations between repeated responses need to be taken into account. The analysis of repeated categorical responses typically focus on regression modelling. However, further insights can also be gained by investigating the structure of the association. The central theme in this study is on the development of joint regression and association models. The analysis of repeated, or otherwise clustered, categorical responses is computationally difficult. Likelihood-based inference is often feasible only when the number of repeated responses for each study unit is small. In Paper IV, an algorithm is presented, which substantially facilitates maximum likelihood fitting, especially when the number of repeated responses increase. In addition, a notable result arising from this work is the freely available software for likelihood-based estimation of clustered categorical responses.
Resumo:
Place identification refers to the process of analyzing sensor data in order to detect places, i.e., spatial areas that are linked with activities and associated with meanings. Place information can be used, e.g., to provide awareness cues in applications that support social interactions, to provide personalized and location-sensitive information to the user, and to support mobile user studies by providing cues about the situations the study participant has encountered. Regularities in human movement patterns make it possible to detect personally meaningful places by analyzing location traces of a user. This thesis focuses on providing system level support for place identification, as well as on algorithmic issues related to the place identification process. The move from location to place requires interactions between location sensing technologies (e.g., GPS or GSM positioning), algorithms that identify places from location data and applications and services that utilize place information. These interactions can be facilitated using a mobile platform, i.e., an application or framework that runs on a mobile phone. For the purposes of this thesis, mobile platforms automate data capture and processing and provide means for disseminating data to applications and other system components. The first contribution of the thesis is BeTelGeuse, a freely available, open source mobile platform that supports multiple runtime environments. The actual place identification process can be understood as a data analysis task where the goal is to analyze (location) measurements and to identify areas that are meaningful to the user. The second contribution of the thesis is the Dirichlet Process Clustering (DPCluster) algorithm, a novel place identification algorithm. The performance of the DPCluster algorithm is evaluated using twelve different datasets that have been collected by different users, at different locations and over different periods of time. As part of the evaluation we compare the DPCluster algorithm against other state-of-the-art place identification algorithms. The results indicate that the DPCluster algorithm provides improved generalization performance against spatial and temporal variations in location measurements.
Resumo:
Cell transition data is obtained from a cellular phone that switches its current serving cell tower. The data consists of a sequence of transition events, which are pairs of cell identifiers and transition times. The focus of this thesis is applying data mining methods to such data, developing new algorithms, and extracting knowledge that will be a solid foundation on which to build location-aware applications. In addition to a thorough exploration of the features of the data, the tools and methods developed in this thesis provide solutions to three distinct research problems. First, we develop clustering algorithms that produce a reliable mapping between cell transitions and physical locations observed by users of mobile devices. The main clustering algorithm operates in online fashion, and we consider also a number of offline clustering methods for comparison. Second, we define the concept of significant locations, known as bases, and give an online algorithm for determining them. Finally, we consider the task of predicting the movement of the user, based on historical data. We develop a prediction algorithm that considers paths of movement in their entirety, instead of just the most recent movement history. All of the presented methods are evaluated with a significant body of real cell transition data, collected from about one hundred different individuals. The algorithms developed in this thesis are designed to be implemented on a mobile device, and require no extra hardware sensors or network infrastructure. By not relying on external services and keeping the user information as much as possible on the user s own personal device, we avoid privacy issues and let the users control the disclosure of their location information.
Resumo:
This thesis studies optimisation problems related to modern large-scale distributed systems, such as wireless sensor networks and wireless ad-hoc networks. The concrete tasks that we use as motivating examples are the following: (i) maximising the lifetime of a battery-powered wireless sensor network, (ii) maximising the capacity of a wireless communication network, and (iii) minimising the number of sensors in a surveillance application. A sensor node consumes energy both when it is transmitting or forwarding data, and when it is performing measurements. Hence task (i), lifetime maximisation, can be approached from two different perspectives. First, we can seek for optimal data flows that make the most out of the energy resources available in the network; such optimisation problems are examples of so-called max-min linear programs. Second, we can conserve energy by putting redundant sensors into sleep mode; we arrive at the sleep scheduling problem, in which the objective is to find an optimal schedule that determines when each sensor node is asleep and when it is awake. In a wireless network simultaneous radio transmissions may interfere with each other. Task (ii), capacity maximisation, therefore gives rise to another scheduling problem, the activity scheduling problem, in which the objective is to find a minimum-length conflict-free schedule that satisfies the data transmission requirements of all wireless communication links. Task (iii), minimising the number of sensors, is related to the classical graph problem of finding a minimum dominating set. However, if we are not only interested in detecting an intruder but also locating the intruder, it is not sufficient to solve the dominating set problem; formulations such as minimum-size identifying codes and locating dominating codes are more appropriate. This thesis presents approximation algorithms for each of these optimisation problems, i.e., for max-min linear programs, sleep scheduling, activity scheduling, identifying codes, and locating dominating codes. Two complementary approaches are taken. The main focus is on local algorithms, which are constant-time distributed algorithms. The contributions include local approximation algorithms for max-min linear programs, sleep scheduling, and activity scheduling. In the case of max-min linear programs, tight upper and lower bounds are proved for the best possible approximation ratio that can be achieved by any local algorithm. The second approach is the study of centralised polynomial-time algorithms in local graphs these are geometric graphs whose structure exhibits spatial locality. Among other contributions, it is shown that while identifying codes and locating dominating codes are hard to approximate in general graphs, they admit a polynomial-time approximation scheme in local graphs.
Resumo:
Extraintestinal pathogenic Escherichia coli (ExPEC) represent a diverse group of strains of E. coli, which infect extraintestinal sites, such as the urinary tract, the bloodstream, the meninges, the peritoneal cavity, and the lungs. Urinary tract infections (UTIs) caused by uropathogenic E. coli (UPEC), the major subgroup of ExPEC, are among the most prevalent microbial diseases world wide and a substantial burden for public health care systems. UTIs are responsible for serious morbidity and mortality in the elderly, in young children, and in immune-compromised and hospitalized patients. ExPEC strains are different, both from genetic and clinical perspectives, from commensal E. coli strains belonging to the normal intestinal flora and from intestinal pathogenic E. coli strains causing diarrhea. ExPEC strains are characterized by a broad range of alternate virulence factors, such as adhesins, toxins, and iron accumulation systems. Unlike diarrheagenic E. coli, whose distinctive virulence determinants evoke characteristic diarrheagenic symptoms and signs, ExPEC strains are exceedingly heterogeneous and are known to possess no specific virulence factors or a set of factors, which are obligatory for the infection of a certain extraintestinal site (e. g. the urinary tract). The ExPEC genomes are highly diverse mosaic structures in permanent flux. These strains have obtained a significant amount of DNA (predictably up to 25% of the genomes) through acquisition of foreign DNA from diverse related or non-related donor species by lateral transfer of mobile genetic elements, including pathogenicity islands (PAIs), plasmids, phages, transposons, and insertion elements. The ability of ExPEC strains to cause disease is mainly derived from this horizontally acquired gene pool; the extragenous DNA facilitates rapid adaptation of the pathogen to changing conditions and hence the extent of the spectrum of sites that can be infected. However, neither the amount of unique DNA in different ExPEC strains (or UPEC strains) nor the mechanisms lying behind the observed genomic mobility are known. Due to this extreme heterogeneity of the UPEC and ExPEC populations in general, the routine surveillance of ExPEC is exceedingly difficult. In this project, we presented a novel virulence gene algorithm (VGA) for the estimation of the extraintestinal virulence potential (VP, pathogenicity risk) of clinically relevant ExPECs and fecal E. coli isolates. The VGA was based on a DNA microarray specific for the ExPEC phenotype (ExPEC pathoarray). This array contained 77 DNA probes homologous with known (e.g. adhesion factors, iron accumulation systems, and toxins) and putative (e.g. genes predictably involved in adhesion, iron uptake, or in metabolic functions) ExPEC virulence determinants. In total, 25 of DNA probes homologous with known virulence factors and 36 of DNA probes representing putative extraintestinal virulence determinants were found at significantly higher frequency in virulent ExPEC isolates than in commensal E. coli strains. We showed that the ExPEC pathoarray and the VGA could be readily used for the differentiation of highly virulent ExPECs both from less virulent ExPEC clones and from commensal E. coli strains as well. Implementing the VGA in a group of unknown ExPECs (n=53) and fecal E. coli isolates (n=37), 83% of strains were correctly identified as extraintestinal virulent or commensal E. coli. Conversely, 15% of clinical ExPECs and 19% of fecal E. coli strains failed to raster into their respective pathogenic and non-pathogenic groups. Clinical data and virulence gene profiles of these strains warranted the estimated VPs; UPEC strains with atypically low risk-ratios were largely isolated from patients with certain medical history, including diabetes mellitus or catheterization, or from elderly patients. In addition, fecal E. coli strains with VPs characteristic for ExPEC were shown to represent the diagnostically important fraction of resident strains of the gut flora with a high potential of causing extraintestinal infections. Interestingly, a large fraction of DNA probes associated with the ExPEC phenotype corresponded to novel DNA sequences without any known function in UTIs and thus represented new genetic markers for the extraintestinal virulence. These DNA probes included unknown DNA sequences originating from the genomic subtractions of four clinical ExPEC isolates as well as from five novel cosmid sequences identified in the UPEC strains HE300 and JS299. The characterized cosmid sequences (pJS332, pJS448, pJS666, pJS700, and pJS706) revealed complex modular DNA structures with known and unknown DNA fragments arranged in a puzzle-like manner and integrated into the common E. coli genomic backbone. Furthermore, cosmid pJS332 of the UPEC strain HE300, which carried a chromosomal virulence gene cluster (iroBCDEN) encoding the salmochelin siderophore system, was shown to be part of a transmissible plasmid of Salmonella enterica. Taken together, the results of this project pointed towards the assumptions that first, (i) homologous recombination, even within coding genes, contributes to the observed mosaicism of ExPEC genomes and secondly, (ii) besides en block transfer of large DNA regions (e.g. chromosomal PAIs) also rearrangements of small DNA modules provide a means of genomic plasticity. The data presented in this project supplemented previous whole genome sequencing projects of E. coli and indicated that each E. coli genome displays a unique assemblage of individual mosaic structures, which enable these strains to successfully colonize and infect different anatomical sites.
Resumo:
Volatile organic compounds (VOCs) affect atmospheric chemistry and thereafter also participate in the climate change in many ways. The long-lived greenhouse gases and tropospheric ozone are the most important radiative forcing components warming the climate, while aerosols are the most important cooling component. VOCs can have warming effects on the climate: they participate in tropospheric ozone formation and compete for oxidants with the greenhouse gases thus, for example, lengthening the atmospheric lifetime of methane. Some VOCs, on the other hand, cool the atmosphere by taking part in the formation of aerosol particles. Some VOCs, in addition, have direct health effects, such as carcinogenic benzene. VOCs are emitted into the atmosphere in various processes. Primary emissions of VOC include biogenic emissions from vegetation, biomass burning and human activities. VOCs are also produced in secondary emissions from the reactions of other organic compounds. Globally, forests are the largest source of VOC entering the atmosphere. This thesis focuses on the measurement results of emissions and concentrations of VOCs in one of the largest vegetation zones in the world, the boreal zone. An automated sampling system was designed and built for continuous VOC concentration and emission measurements with a proton transfer reaction - mass spectrometer (PTR-MS). The system measured one hour at a time in three-hourly cycles: 1) ambient volume mixing-ratios of VOCs in the Scots-pine-dominated boreal forest, 2) VOC fluxes above the canopy, and 3) VOC emissions from Scots pine shoots. In addition to the online PTR-MS measurements, we determined the composition and seasonality of the VOC emissions from a Siberian larch with adsorbent samples and GC-MS analysis. The VOC emissions from Siberian larch were reported for the fist time in the literature. The VOC emissions were 90% monoterpenes (mainly sabinene) and the rest sesquiterpenes (mainly a-farnesene). The normalized monoterpene emission potentials were highest in late summer, rising again in late autumn. The normalized sesquiterpene emission potentials were also highest in late summer, but decreased towards the autumn. The emissions of mono- and sesquiterpenes from the deciduous Siberian larch, as well as the emissions of monoterpenes measured from the evergreen Scots pine, were well described by the temperature-dependent algorithm. In the Scots-pine-dominated forest, canopy-scale emissions of monoterpenes and oxygenated VOCs (OVOCs) were of the same magnitude. Methanol and acetone were the most abundant OVOCs emitted from the forest and also in the ambient air. Annually, methanol and mixing ratios were of the order of 1 ppbv. The monoterpene and sum of isoprene 2-methyl-3-buten-2-ol (MBO) volume mixing-ratios were an order of magnitude lower. The majority of the monoterpene and methanol emissions from the Scots-pinedominated forest were explained by emissions from Scots pine shoots. The VOCs were divided into three classes based on the dynamics of the summer-time concentrations: 1) reactive compounds with local biological, anthropogenic or chemical sources (methanol, acetone, butanol and hexanal), 2) compounds whose emissions are only temperaturedependent (monoterpenes), 3) long-lived compounds (benzene, acetaldehyde). Biogenic VOC (methanol, acetone, isoprene MBO and monoterpene) volume mixing-ratios had clear diurnal patterns during summer. The ambient mixing ratios of other VOCs did not show this behaviour. During winter we did not observe systematical diurnal cycles for any of the VOCs. Different sources, removal processes and turbulent mixing explained the dynamics of the measured mixing-ratios qualitatively. However, quantitative understanding will require longterm emission measurements of the OVOCs and the use of comprehensive chemistry models. Keywords: Hydrocarbons, VOC, fluxes, volume mixing-ratio, boreal forest
Resumo:
The Thesis presents a state-space model for a basketball league and a Kalman filter algorithm for the estimation of the state of the league. In the state-space model, each of the basketball teams is associated with a rating that represents its strength compared to the other teams. The ratings are assumed to evolve in time following a stochastic process with independent Gaussian increments. The estimation of the team ratings is based on the observed game scores that are assumed to depend linearly on the true strengths of the teams and independent Gaussian noise. The team ratings are estimated using a recursive Kalman filter algorithm that produces least squares optimal estimates for the team strengths and predictions for the scores of the future games. Additionally, if the Gaussianity assumption holds, the predictions given by the Kalman filter maximize the likelihood of the observed scores. The team ratings allow probabilistic inference about the ranking of the teams and their relative strengths as well as about the teams’ winning probabilities in future games. The predictions about the winners of the games are correct 65-70% of the time. The team ratings explain 16% of the random variation observed in the game scores. Furthermore, the winning probabilities given by the model are concurrent with the observed scores. The state-space model includes four independent parameters that involve the variances of noise terms and the home court advantage observed in the scores. The Thesis presents the estimation of these parameters using the maximum likelihood method as well as using other techniques. The Thesis also gives various example analyses related to the American professional basketball league, i.e., National Basketball Association (NBA), and regular seasons played in year 2005 through 2010. Additionally, the season 2009-2010 is discussed in full detail, including the playoffs.
Resumo:
Algoritmien suoritusajasta kuluu usein merkittävä osa datan siirtelyyn muistihierarkian kerrosten välillä. Ongelma korostuu hakurakenteilla, sillä ne käsittelevät suuria datamääriä. Työn päämääränä on selvittää muistisiirtojen minimoinnilla saavutettavat käytännön edut hakupuiden tapauksessa. Toinen päämäärä on kartoittaa ideaalisen välimuistin malliin perustuvien parametrittomien hakupuiden etuja ja heikkouksia ulkoisen muistin malliin perustuviin parametrillisiin hakupuihin nähden. Parametrittomuus tarkoittaa, ettei algoritmi tiedä käytetyn suunnittelumallin parametreja, kuten välimuistin kokoa. Staattisista hakupuista tarkastellaan leveyssuuntaiseen järjestykseen, esijärjestykseen ja van Emde Boas -järjestykseen tallennettuja binäärihakupuita sekä staattista B-puuta. Dynaamisista hakupuista käsitellään B+-puuta sekä parametritonta B-puuta, COB-puuta. Sekä parametrittomat että parametrilliset hakupuut pyrkivät minimoimaan vaadittavaa muistisiirtojen määrää parantamalla laskennan paikallisuutta. Käsiteltävien hakupuiden käytännön nopeutta testataan monipuolisesti. Saatujen tulosten nojalla sekä staattiset että dynaamiset parametrittomat hakupuut pärjäävät satunnaisoperaatioiden nopeudessa vastaaville parametrillisille hakupuille. Ne jäävät kuitenkin jonkin verran jälkeen perättäisoperaatioita suoritettaessa.
Resumo:
Bayesian networks are compact, flexible, and interpretable representations of a joint distribution. When the network structure is unknown but there are observational data at hand, one can try to learn the network structure. This is called structure discovery. This thesis contributes to two areas of structure discovery in Bayesian networks: space--time tradeoffs and learning ancestor relations. The fastest exact algorithms for structure discovery in Bayesian networks are based on dynamic programming and use excessive amounts of space. Motivated by the space usage, several schemes for trading space against time are presented. These schemes are presented in a general setting for a class of computational problems called permutation problems; structure discovery in Bayesian networks is seen as a challenging variant of the permutation problems. The main contribution in the area of the space--time tradeoffs is the partial order approach, in which the standard dynamic programming algorithm is extended to run over partial orders. In particular, a certain family of partial orders called parallel bucket orders is considered. A partial order scheme that provably yields an optimal space--time tradeoff within parallel bucket orders is presented. Also practical issues concerning parallel bucket orders are discussed. Learning ancestor relations, that is, directed paths between nodes, is motivated by the need for robust summaries of the network structures when there are unobserved nodes at work. Ancestor relations are nonmodular features and hence learning them is more difficult than modular features. A dynamic programming algorithm is presented for computing posterior probabilities of ancestor relations exactly. Empirical tests suggest that ancestor relations can be learned from observational data almost as accurately as arcs even in the presence of unobserved nodes.