526 resultados para työn kehittäminen
Resumo:
Precipitation-induced runoff and leaching from milled peat mining mires by peat types: a comparative method for estimating the loading of water bodies during peat production. This research project in environmental geology has arisen out of an observed need to be able to predict more accurately the loading of watercourses with detrimental organic substances and nutrients from already existing and planned peat production areas, since the authorities capacity for insisting on such predictions covering the whole duration of peat production in connection with evaluations of environmental impact is at present highly limited. National and international decisions regarding monitoring of the condition of watercourses and their improvement and restoration require more sophisticated evaluation methods in order to be able to forecast watercourse loading and its environmental impacts at the stage of land-use planning and preparations for peat production.The present project thus set out from the premise that it would be possible on the basis of existing mire and peat data properties to construct estimates for the typical loading from production mires over the whole duration of their exploitation. Finland has some 10 million hectares of peatland, accounting for almost a third of its total area. Macroclimatic conditions have varied in the course of the Holocene growth and development of this peatland, and with them the habitats of the peat-forming plants. Temperatures and moisture conditions have played a significant role in determining the dominant species of mire plants growing there at any particular time, the resulting mire types and the accumulation and deposition of plant remains to form the peat. The above climatic, environmental and mire development factors, together with ditching, have contributed, and continue to contribute, to the existence of peat horizons that differ in their physical and chemical properties, leading to differences in material transport between peatlands in a natural state and mires that have been ditched or prepared for forestry and peat production. Watercourse loading from the ditching of mires or their use for peat production can have detrimental effects on river and lake environments and their recreational use, especially where oxygen-consuming organic solids and soluble organic substances and nutrients are concerned. It has not previously been possible, however, to estimate in advance the watercourse loading likely to arise from ditching and peat production on the basis of the characteristics of the peat in a mire, although earlier observations have indicated that watercourse loading from peat production can vary greatly and it has been suggested that differences in peat properties may be of significance in this. Sprinkling is used here in combination with simulations of conditions in a milled peat production area to determine the influence of the physical and chemical properties of milled peats in production mires on surface runoff into the drainage ditches and the concentrations of material in the runoff water. Sprinkling and extraction experiments were carried out on 25 samples of milled Carex (C) and Sphagnum (S) peat of humification grades H 2.5 8.5 with moisture content in the range 23.4 89% on commencement of the first sprinkling, which was followed by a second sprinkling 24 hours later. The water retention capacity of the peat was best, and surface runoff lowest, with Sphagnum and Carex peat samples of humification grades H 2.5 6 in the moisture content class 56 75%. On account of the hydrophobicity of dry peat, runoff increased in a fairly regular manner with drying of the sample from 55% to 24 30%. Runoff from the samples with an original moisture content over 55% increased by 63% in the second round of sprinkling relative to the first, as they had practically reached saturation point on the first occasion, while those with an original moisture content below 55% retained their high runoff in the second round, due to continued hydrophobicity. The well-humified samples (H 6.5 8.5) with a moisture content over 80% showed a low water retention capacity and high runoff in both rounds of sprinkling. Loading of the runoff water with suspended solids, total phosphorus and total nitrogen, and also the chemical oxygen demand (CODMn O2), varied greatly in the sprinkling experiment, depending on the peat type and degree of humification, but concentrations of the same substances in the two sprinklings were closely or moderately closely correlated and these correlations were significant. The concentrations of suspended solids in the runoff water observed in the simulations of a peat production area and the direct surface runoff from it into the drainage ditch system in response to rain (sprinkling intensity 1.27 mm/min) varied c. 60-fold between the degrees of humification in the case of the Carex peats and c. 150-fold for the Sphagnum peats, while chemical oxygen demand varied c. 30-fold and c. 50-fold, respectively, total phosphorus c. 60-fold and c. 66-fold, total nitrogen c. 65-fold and c. 195-fold and ammonium nitrogen c. 90-fold and c. 30-fold. The increases in concentrations in the runoff water were very closely correlated with increases in humification of the peat. The correlations of the concentrations measured in extraction experiments (48 h) with peat type and degree of humification corresponded to those observed in the sprinkler experiments. The resulting figures for the surface runoff from a peat production area into the drainage ditches simulated by means of sprinkling and material concentrations in the runoff water were combined with statistics on the mean extent of daily rainfall (0 67 mm) during the frost-free period of the year (May October) over an observation period of 30 years to yield typical annual loading figures (kg/ha) for suspended solids (SS), chemical oxygen demand of organic matter (CODmn O2), total phosphorus (tot. P) and total nitrogen (tot. N) entering the ditches with respect to milled Carex (C) and Sphagnum (S) peats of humification grades H 2.5 8.5. In order to calculate the loading of drainage ditches from a milled peat production mire with the aid of these annual comparative values (in kg/ha), information is required on the properties of the intended production mire and its peat. Once data are available on the area of the mire, its peat depth, peat types and their degrees of humification, dry matter content, calorific value and corresponding energy content, it is possible to produce mutually comparable estimates for individual mires with respect to the annual loading of the drainage ditch system and the surrounding watercourse for the whole service life of the production area, the duration of this service life, determinations of energy content and the amount of loading per unit of energy generated (kg/MWh). In the 8 mires in the Köyhäjoki basin, Central Ostrobothnia, taken as an example, the loading of suspended solids (SS) in the drainage ditch networks calculated on the basis of the typical values obtained here and existing mire and peat data and expressed per unit of energy generated varied between the mires and horizons in the range 0.9 16.5 kg/MWh. One of the aims of this work was to develop means of making better use of existing mire and peat data and the results of corings and other field investigations. In this respect combination of the typical loading values (kg/ha) obtained here for S, SC, CS and C peats and the various degrees of humification (H 2.5 8.5) with the above mire and peat data by means of a computer program for the acquisition and handling of such data would enable all the information currently available and that deposited in the system in the future to be used for defining watercourse loading estimates for mires and comparing them with the corresponding estimates of energy content. The intention behind this work has been to respond to the challenge facing the energy generation industry to find larger peat production areas that exert less loading on the environment and to that facing the environmental authorities to improve the means available for estimating watercourse loading from peat production and its environmental impacts in advance. The results conform well to the initial hypothesis and to the goals laid down for the research and should enable watercourse loading from existing and planned peat production to be evaluated better in the future and the resulting impacts to be taken into account when planning land use and energy generation. The advance loading information available in this way would be of value in the selection of individual peat production areas, the planning of their exploitation, the introduction of water protection measures and the planning of loading inspections, in order to achieve controlled peat production that pays due attention to environmental considerations.
Resumo:
A new rock mass classification scheme, the Host Rock Classification system (HRC-system) has been developed for evaluating the suitability of volumes of rock mass for the disposal of high-level nuclear waste in Precambrian crystalline bedrock. To support the development of the system, the requirements of host rock to be used for disposal have been studied in detail and the significance of the various rock mass properties have been examined. The HRC-system considers both the long-term safety of the repository and the constructability in the rock mass. The system is specific to the KBS-3V disposal concept and can be used only at sites that have been evaluated to be suitable at the site scale. By using the HRC-system, it is possible to identify potentially suitable volumes within the site at several different scales (repository, tunnel and canister scales). The selection of the classification parameters to be included in the HRC-system is based on an extensive study on the rock mass properties and their various influences on the long-term safety, the constructability and the layout and location of the repository. The parameters proposed for the classification at the repository scale include fracture zones, strength/stress ratio, hydraulic conductivity and the Groundwater Chemistry Index. The parameters proposed for the classification at the tunnel scale include hydraulic conductivity, Q´ and fracture zones and the parameters proposed for the classification at the canister scale include hydraulic conductivity, Q´, fracture zones, fracture width (aperture + filling) and fracture trace length. The parameter values will be used to determine the suitability classes for the volumes of rock to be classified. The HRC-system includes four suitability classes at the repository and tunnel scales and three suitability classes at the canister scale and the classification process is linked to several important decisions regarding the location and acceptability of many components of the repository at all three scales. The HRC-system is, thereby, one possible design tool that aids in locating the different repository components into volumes of host rock that are more suitable than others and that are considered to fulfil the fundamental requirements set for the repository host rock. The generic HRC-system, which is the main result of this work, is also adjusted to the site-specific properties of the Olkiluoto site in Finland and the classification procedure is demonstrated by a test classification using data from Olkiluoto. Keywords: host rock, classification, HRC-system, nuclear waste disposal, long-term safety, constructability, KBS-3V, crystalline bedrock, Olkiluoto
Resumo:
In Finland one of the most important current issues in the environmental management is the quality of surface waters. The increasing social importance of lakes and water systems has generated wide-ranging interest in lake restoration and management, concerning especially lakes suffering from eutrophication, but also from other environmental impacts. Most of the factors deteriorating the water quality in Finnish lakes are connected to human activities. Especially since the 1940's, the intensified farming practices and conduction of sewage waters from scattered settlements, cottages and industry have affected the lakes, which simultaneously have developed in to recreational areas for a growing number of people. Therefore, this study was focused on small lakes, which are human impacted, located close to settlement areas and have a significant value for local population. The aim of this thesis was to obtain information from lake sediment records for on-going lake restoration activities and to prove that a well planned, properly focused lake sediment study is an essential part of the work related to evaluation, target consideration and restoration of Finnish lakes. Altogether 11 lakes were studied. The study of Lake Kaljasjärvi was related to the gradual eutrophication of the lake. In lakes Ormajärvi, Suolijärvi, Lehee, Pyhäjärvi and Iso-Roine the main focus was on sediment mapping, as well as on the long term changes of the sedimentation, which were compared to Lake Pääjärvi. In Lake Hormajärvi the role of different kind of sedimentation environments in the eutrophication development of the lake's two basins were compared. Lake Orijärvi has not been eutrophied, but the ore exploitation and related acid main drainage from the catchment area have influenced the lake drastically and the changes caused by metal load were investigated. The twin lakes Etujärvi and Takajärvi are slightly eutrophied, but also suffer problems associated with the erosion of the substantial peat accumulations covering the fringe areas of the lakes. These peat accumulations are related to Holocene water level changes, which were investigated. The methods used were chosen case-specifically for each lake. In general, acoustic soundings of the lakes, detailed description of the nature of the sediment and determinations of the physical properties of the sediment, such as water content, loss on ignition and magnetic susceptibility were used, as was grain size analysis. A wide set of chemical analyses was also used. Diatom and chrysophycean cyst analyses were applied, and the diatom inferred total phosphorus content was reconstructed. The results of these studies prove, that the ideal lake sediment study, as a part of a lake management project, should be two-phased. In the first phase, thoroughgoing mapping of sedimentation patterns should be carried out by soundings and adequate corings. The actual sampling, based on the preliminary results, must include at least one long core from the main sedimentation basin for the determining the natural background state of the lake. The recent, artificially impacted development of the lake can then be determined by short-core and surface sediment studies. The sampling must be focused on the basis of the sediment mapping again, and it should represent all different sedimentation environments and bottom dynamic zones, considering the inlets and outlets, as well as the effects of possible point loaders of the lake. In practice, the budget of the lake management projects of is usually limited and only the most essential work and analyses can be carried out. The set of chemical and biological analyses and dating methods must therefore been thoroughly considered and adapted to the specific management problem. The results show also, that information obtained from a properly performed sediment study enhances the planning of the restoration, makes possible to define the target of the remediation activities and improves the cost-efficiency of the project.
Resumo:
Whether a statistician wants to complement a probability model for observed data with a prior distribution and carry out fully probabilistic inference, or base the inference only on the likelihood function, may be a fundamental question in theory, but in practice it may well be of less importance if the likelihood contains much more information than the prior. Maximum likelihood inference can be justified as a Gaussian approximation at the posterior mode, using flat priors. However, in situations where parametric assumptions in standard statistical models would be too rigid, more flexible model formulation, combined with fully probabilistic inference, can be achieved using hierarchical Bayesian parametrization. This work includes five articles, all of which apply probability modeling under various problems involving incomplete observation. Three of the papers apply maximum likelihood estimation and two of them hierarchical Bayesian modeling. Because maximum likelihood may be presented as a special case of Bayesian inference, but not the other way round, in the introductory part of this work we present a framework for probability-based inference using only Bayesian concepts. We also re-derive some results presented in the original articles using the toolbox equipped herein, to show that they are also justifiable under this more general framework. Here the assumption of exchangeability and de Finetti's representation theorem are applied repeatedly for justifying the use of standard parametric probability models with conditionally independent likelihood contributions. It is argued that this same reasoning can be applied also under sampling from a finite population. The main emphasis here is in probability-based inference under incomplete observation due to study design. This is illustrated using a generic two-phase cohort sampling design as an example. The alternative approaches presented for analysis of such a design are full likelihood, which utilizes all observed information, and conditional likelihood, which is restricted to a completely observed set, conditioning on the rule that generated that set. Conditional likelihood inference is also applied for a joint analysis of prevalence and incidence data, a situation subject to both left censoring and left truncation. Other topics covered are model uncertainty and causal inference using posterior predictive distributions. We formulate a non-parametric monotonic regression model for one or more covariates and a Bayesian estimation procedure, and apply the model in the context of optimal sequential treatment regimes, demonstrating that inference based on posterior predictive distributions is feasible also in this case.
Resumo:
This thesis consists of an introduction, four research articles and an appendix. The thesis studies relations between two different approaches to continuum limit of models of two dimensional statistical mechanics at criticality. The approach of conformal field theory (CFT) could be thought of as the algebraic classification of some basic objects in these models. It has been succesfully used by physicists since 1980's. The other approach, Schramm-Loewner evolutions (SLEs), is a recently introduced set of mathematical methods to study random curves or interfaces occurring in the continuum limit of the models. The first and second included articles argue on basis of statistical mechanics what would be a plausible relation between SLEs and conformal field theory. The first article studies multiple SLEs, several random curves simultaneously in a domain. The proposed definition is compatible with a natural commutation requirement suggested by Dubédat. The curves of multiple SLE may form different topological configurations, ``pure geometries''. We conjecture a relation between the topological configurations and CFT concepts of conformal blocks and operator product expansions. Example applications of multiple SLEs include crossing probabilities for percolation and Ising model. The second article studies SLE variants that represent models with boundary conditions implemented by primary fields. The most well known of these, SLE(kappa, rho), is shown to be simple in terms of the Coulomb gas formalism of CFT. In the third article the space of local martingales for variants of SLE is shown to carry a representation of Virasoro algebra. Finding this structure is guided by the relation of SLEs and CFTs in general, but the result is established in a straightforward fashion. This article, too, emphasizes multiple SLEs and proposes a possible way of treating pure geometries in terms of Coulomb gas. The fourth article states results of applications of the Virasoro structure to the open questions of SLE reversibility and duality. Proofs of the stated results are provided in the appendix. The objective is an indirect computation of certain polynomial expected values. Provided that these expected values exist, in generic cases they are shown to possess the desired properties, thus giving support for both reversibility and duality.
Resumo:
The monograph dissertation deals with kernel integral operators and their mapping properties on Euclidean domains. The associated kernels are weakly singular and examples of such are given by Green functions of certain elliptic partial differential equations. It is well known that mapping properties of the corresponding Green operators can be used to deduce a priori estimates for the solutions of these equations. In the dissertation, natural size- and cancellation conditions are quantified for kernels defined in domains. These kernels induce integral operators which are then composed with any partial differential operator of prescribed order, depending on the size of the kernel. The main object of study in this dissertation being the boundedness properties of such compositions, the main result is the characterization of their Lp-boundedness on suitably regular domains. In case the aforementioned kernels are defined in the whole Euclidean space, their partial derivatives of prescribed order turn out to be so called standard kernels that arise in connection with singular integral operators. The Lp-boundedness of singular integrals is characterized by the T1 theorem, which is originally due to David and Journé and was published in 1984 (Ann. of Math. 120). The main result in the dissertation can be interpreted as a T1 theorem for weakly singular integral operators. The dissertation deals also with special convolution type weakly singular integral operators that are defined on Euclidean spaces.
Resumo:
The focus of this study is on statistical analysis of categorical responses, where the response values are dependent of each other. The most typical example of this kind of dependence is when repeated responses have been obtained from the same study unit. For example, in Paper I, the response of interest is the pneumococcal nasopharengyal carriage (yes/no) on 329 children. For each child, the carriage is measured nine times during the first 18 months of life, and thus repeated respones on each child cannot be assumed independent of each other. In the case of the above example, the interest typically lies in the carriage prevalence, and whether different risk factors affect the prevalence. Regression analysis is the established method for studying the effects of risk factors. In order to make correct inferences from the regression model, the associations between repeated responses need to be taken into account. The analysis of repeated categorical responses typically focus on regression modelling. However, further insights can also be gained by investigating the structure of the association. The central theme in this study is on the development of joint regression and association models. The analysis of repeated, or otherwise clustered, categorical responses is computationally difficult. Likelihood-based inference is often feasible only when the number of repeated responses for each study unit is small. In Paper IV, an algorithm is presented, which substantially facilitates maximum likelihood fitting, especially when the number of repeated responses increase. In addition, a notable result arising from this work is the freely available software for likelihood-based estimation of clustered categorical responses.
Resumo:
In cardiac myocytes (heart muscle cells), coupling of electric signal known as the action potential to contraction of the heart depends crucially on calcium-induced calcium release (CICR) in a microdomain known as the dyad. During CICR, the peak number of free calcium ions (Ca) present in the dyad is small, typically estimated to be within range 1-100. Since the free Ca ions mediate CICR, noise in Ca signaling due to the small number of free calcium ions influences Excitation-Contraction (EC) coupling gain. Noise in Ca signaling is only one noise type influencing cardiac myocytes, e.g., ion channels playing a central role in action potential propagation are stochastic machines, each of which gates more or less randomly, which produces gating noise present in membrane currents. How various noise sources influence macroscopic properties of a myocyte, how noise is attenuated and taken advantage of are largely open questions. In this thesis, the impact of noise on CICR, EC coupling and, more generally, macroscopic properties of a cardiac myocyte is investigated at multiple levels of detail using mathematical models. Complementarily to the investigation of the impact of noise on CICR, computationally-efficient yet spatially-detailed models of CICR are developed. The results of this thesis show that (1) gating noise due to the high-activity mode of L-type calcium channels playing a major role in CICR may induce early after-depolarizations associated with polymorphic tachycardia, which is a frequent precursor to sudden cardiac death in heart failure patients; (2) an increased level of voltage noise typically increases action potential duration and it skews distribution of action potential durations toward long durations in cardiac myocytes; and that (3) while a small number of Ca ions mediate CICR, Excitation-Contraction coupling is robust against this noise source, partly due to the shape of ryanodine receptor protein structures present in the cardiac dyad.
Kohti uudistuvaa Verkkokoulua : Käyttäjätutkimus osana Tilastokeskuksen oppimateriaalin kehittämistä
Resumo:
Tutkimuksen lähtökohtana on tilastojen luku- ja käyttötaitojen kehittäminen verkko-oppimisympäristöissä. Tutkimus käynnistyi Tilastokeskuksen Verkkokoulun kehittämisen tarpeesta. Päätavoitteena on muodostaa toimenpide-ehdotuksia Verkkokoulun kehittämiselle käyttäjälähtöisestä näkökulmasta. Tutkimuksessa selvitetään, miten matematiikan aineenopettajat käyttävät Tilastokeskuksen Verkkokoulua opetuksessaan ja opetuksensa suunnittelussa, ja millaisia toiveita heillä ja Verkkokoulun parissa työskentelevillä tilastokeskuslaisilla on Verkkokoulun kehittämiseksi. Tämän lisäksi selvitetään Verkkokoulun käyttöä kävijäseurantapalvelun avulla. Tutkimuksella etsitään vastauksia siihen, miten Verkkokoulua voidaan kehittää käyttäjän kannalta paremmaksi palveluksi. Tutkimusongelmiin vastataan verkkokyselyllä, sähköpostihaastattelulla, teemahaastatteluilla ja kävijäseurantapalvelulla kerättyjen aineistojen kvalitatiivisella analyysillä. Matematiikan aineenopettajille tehty kyselylomake toteutettiin maaliskuussa 2009 ja haastattelut syksyn 2009 aikana. Kävijäseurantapalvelun tarkasteluajankohta on 1.9.2008–31.8.2009. Tutkimuksen tulosten mukaan Verkkokoulu tunnetaan huonosti matematiikan aineenopettajien keskuudessa. Verkkokoulun oppimateriaaleja käytetään enemmän opetuksen suunnitteluun kuin opetukseen. Opettajat toivovat etenkin arkielämälähtöisiä ja helposti käytettäviä opetuksen suunnitteluun soveltuvia palveluja, jotka soveltuvat myös perinteiseen luokkahuoneopetukseen. Tilastokeskuslaiset näkevät Verkkokoulun kehittämisen tarpeelliseksi, vaikkakin haastavaksi kehittämiseen tarvittavien resurssien vähyyden vuoksi. Heidän mukaansa Verkkokoulua tulisi päivittää ja laajentaa monipuolisilla oppimateriaalikokonaisuuksilla. Myös teknisen toteutuksen toivotaan uudistuvan. Kävijäseurantapalvelun aineiston mukaan Verkkokoulu on keskimääräistä käytetympi kuin Tilastokeskuksen verkkosivut kokonaisuudes- saan. Verkkokoulua käytetään enemmän yksittäisten asioiden tietojen tarkistamiseen kuin kokonaisuuksien opiskeluun. Johtopäätöksenä voidaan todeta, että Tilastokeskuksen Verkkokoulu on palvelu, joka käyttäjälähtöisen kehittämisen ja markkinoinnin myötä voi nousta merkittävään rooliin tilastoalan kouluttajana. Jotta tähän päästään, kehittämiseen on löydettävä riittävästi resursseja.
Resumo:
In this Thesis, we develop theory and methods for computational data analysis. The problems in data analysis are approached from three perspectives: statistical learning theory, the Bayesian framework, and the information-theoretic minimum description length (MDL) principle. Contributions in statistical learning theory address the possibility of generalization to unseen cases, and regression analysis with partially observed data with an application to mobile device positioning. In the second part of the Thesis, we discuss so called Bayesian network classifiers, and show that they are closely related to logistic regression models. In the final part, we apply the MDL principle to tracing the history of old manuscripts, and to noise reduction in digital signals.
Resumo:
In recent years, XML has been widely adopted as a universal format for structured data. A variety of XML-based systems have emerged, most prominently SOAP for Web services, XMPP for instant messaging, and RSS and Atom for content syndication. This popularity is helped by the excellent support for XML processing in many programming languages and by the variety of XML-based technologies for more complex needs of applications. Concurrently with this rise of XML, there has also been a qualitative expansion of the Internet's scope. Namely, mobile devices are becoming capable enough to be full-fledged members of various distributed systems. Such devices are battery-powered, their network connections are based on wireless technologies, and their processing capabilities are typically much lower than those of stationary computers. This dissertation presents work performed to try to reconcile these two developments. XML as a highly redundant text-based format is not obviously suitable for mobile devices that need to avoid extraneous processing and communication. Furthermore, the protocols and systems commonly used in XML messaging are often designed for fixed networks and may make assumptions that do not hold in wireless environments. This work identifies four areas of improvement in XML messaging systems: the programming interfaces to the system itself and to XML processing, the serialization format used for the messages, and the protocol used to transmit the messages. We show a complete system that improves the overall performance of XML messaging through consideration of these areas. The work is centered on actually implementing the proposals in a form usable on real mobile devices. The experimentation is performed on actual devices and real networks using the messaging system implemented as a part of this work. The experimentation is extensive and, due to using several different devices, also provides a glimpse of what the performance of these systems may look like in the future.
Resumo:
Matrix decompositions, where a given matrix is represented as a product of two other matrices, are regularly used in data mining. Most matrix decompositions have their roots in linear algebra, but the needs of data mining are not always those of linear algebra. In data mining one needs to have results that are interpretable -- and what is considered interpretable in data mining can be very different to what is considered interpretable in linear algebra. --- The purpose of this thesis is to study matrix decompositions that directly address the issue of interpretability. An example is a decomposition of binary matrices where the factor matrices are assumed to be binary and the matrix multiplication is Boolean. The restriction to binary factor matrices increases interpretability -- factor matrices are of the same type as the original matrix -- and allows the use of Boolean matrix multiplication, which is often more intuitive than normal matrix multiplication with binary matrices. Also several other decomposition methods are described, and the computational complexity of computing them is studied together with the hardness of approximating the related optimization problems. Based on these studies, algorithms for constructing the decompositions are proposed. Constructing the decompositions turns out to be computationally hard, and the proposed algorithms are mostly based on various heuristics. Nevertheless, the algorithms are shown to be capable of finding good results in empirical experiments conducted with both synthetic and real-world data.
Resumo:
This thesis presents methods for locating and analyzing cis-regulatory DNA elements involved with the regulation of gene expression in multicellular organisms. The regulation of gene expression is carried out by the combined effort of several transcription factor proteins collectively binding the DNA on the cis-regulatory elements. Only sparse knowledge of the 'genetic code' of these elements exists today. An automatic tool for discovery of putative cis-regulatory elements could help their experimental analysis, which would result in a more detailed view of the cis-regulatory element structure and function. We have developed a computational model for the evolutionary conservation of cis-regulatory elements. The elements are modeled as evolutionarily conserved clusters of sequence-specific transcription factor binding sites. We give an efficient dynamic programming algorithm that locates the putative cis-regulatory elements and scores them according to the conservation model. A notable proportion of the high-scoring DNA sequences show transcriptional enhancer activity in transgenic mouse embryos. The conservation model includes four parameters whose optimal values are estimated with simulated annealing. With good parameter values the model discriminates well between the DNA sequences with evolutionarily conserved cis-regulatory elements and the DNA sequences that have evolved neutrally. In further inquiry, the set of highest scoring putative cis-regulatory elements were found to be sensitive to small variations in the parameter values. The statistical significance of the putative cis-regulatory elements is estimated with the Two Component Extreme Value Distribution. The p-values grade the conservation of the cis-regulatory elements above the neutral expectation. The parameter values for the distribution are estimated by simulating the neutral DNA evolution. The conservation of the transcription factor binding sites can be used in the upstream analysis of regulatory interactions. This approach may provide mechanistic insight to the transcription level data from, e.g., microarray experiments. Here we give a method to predict shared transcriptional regulators for a set of co-expressed genes. The EEL (Enhancer Element Locator) software implements the method for locating putative cis-regulatory elements. The software facilitates both interactive use and distributed batch processing. We have used it to analyze the non-coding regions around all human genes with respect to the orthologous regions in various other species including mouse. The data from these genome-wide analyzes is stored in a relational database which is used in the publicly available web services for upstream analysis and visualization of the putative cis-regulatory elements in the human genome.
Resumo:
In visual object detection and recognition, classifiers have two interesting characteristics: accuracy and speed. Accuracy depends on the complexity of the image features and classifier decision surfaces. Speed depends on the hardware and the computational effort required to use the features and decision surfaces. When attempts to increase accuracy lead to increases in complexity and effort, it is necessary to ask how much are we willing to pay for increased accuracy. For example, if increased computational effort implies quickly diminishing returns in accuracy, then those designing inexpensive surveillance applications cannot aim for maximum accuracy at any cost. It becomes necessary to find trade-offs between accuracy and effort. We study efficient classification of images depicting real-world objects and scenes. Classification is efficient when a classifier can be controlled so that the desired trade-off between accuracy and effort (speed) is achieved and unnecessary computations are avoided on a per input basis. A framework is proposed for understanding and modeling efficient classification of images. Classification is modeled as a tree-like process. In designing the framework, it is important to recognize what is essential and to avoid structures that are narrow in applicability. Earlier frameworks are lacking in this regard. The overall contribution is two-fold. First, the framework is presented, subjected to experiments, and shown to be satisfactory. Second, certain unconventional approaches are experimented with. This allows the separation of the essential from the conventional. To determine if the framework is satisfactory, three categories of questions are identified: trade-off optimization, classifier tree organization, and rules for delegation and confidence modeling. Questions and problems related to each category are addressed and empirical results are presented. For example, related to trade-off optimization, we address the problem of computational bottlenecks that limit the range of trade-offs. We also ask if accuracy versus effort trade-offs can be controlled after training. For another example, regarding classifier tree organization, we first consider the task of organizing a tree in a problem-specific manner. We then ask if problem-specific organization is necessary.
Resumo:
Cell transition data is obtained from a cellular phone that switches its current serving cell tower. The data consists of a sequence of transition events, which are pairs of cell identifiers and transition times. The focus of this thesis is applying data mining methods to such data, developing new algorithms, and extracting knowledge that will be a solid foundation on which to build location-aware applications. In addition to a thorough exploration of the features of the data, the tools and methods developed in this thesis provide solutions to three distinct research problems. First, we develop clustering algorithms that produce a reliable mapping between cell transitions and physical locations observed by users of mobile devices. The main clustering algorithm operates in online fashion, and we consider also a number of offline clustering methods for comparison. Second, we define the concept of significant locations, known as bases, and give an online algorithm for determining them. Finally, we consider the task of predicting the movement of the user, based on historical data. We develop a prediction algorithm that considers paths of movement in their entirety, instead of just the most recent movement history. All of the presented methods are evaluated with a significant body of real cell transition data, collected from about one hundred different individuals. The algorithms developed in this thesis are designed to be implemented on a mobile device, and require no extra hardware sensors or network infrastructure. By not relying on external services and keeping the user information as much as possible on the user s own personal device, we avoid privacy issues and let the users control the disclosure of their location information.