17 resultados para Genetic programming (Computer science)

em Helda - Digital Repository of University of Helsinki


Relevância:

100.00% 100.00%

Publicador:

Resumo:

This thesis presents methods for locating and analyzing cis-regulatory DNA elements involved with the regulation of gene expression in multicellular organisms. The regulation of gene expression is carried out by the combined effort of several transcription factor proteins collectively binding the DNA on the cis-regulatory elements. Only sparse knowledge of the 'genetic code' of these elements exists today. An automatic tool for discovery of putative cis-regulatory elements could help their experimental analysis, which would result in a more detailed view of the cis-regulatory element structure and function. We have developed a computational model for the evolutionary conservation of cis-regulatory elements. The elements are modeled as evolutionarily conserved clusters of sequence-specific transcription factor binding sites. We give an efficient dynamic programming algorithm that locates the putative cis-regulatory elements and scores them according to the conservation model. A notable proportion of the high-scoring DNA sequences show transcriptional enhancer activity in transgenic mouse embryos. The conservation model includes four parameters whose optimal values are estimated with simulated annealing. With good parameter values the model discriminates well between the DNA sequences with evolutionarily conserved cis-regulatory elements and the DNA sequences that have evolved neutrally. In further inquiry, the set of highest scoring putative cis-regulatory elements were found to be sensitive to small variations in the parameter values. The statistical significance of the putative cis-regulatory elements is estimated with the Two Component Extreme Value Distribution. The p-values grade the conservation of the cis-regulatory elements above the neutral expectation. The parameter values for the distribution are estimated by simulating the neutral DNA evolution. The conservation of the transcription factor binding sites can be used in the upstream analysis of regulatory interactions. This approach may provide mechanistic insight to the transcription level data from, e.g., microarray experiments. Here we give a method to predict shared transcriptional regulators for a set of co-expressed genes. The EEL (Enhancer Element Locator) software implements the method for locating putative cis-regulatory elements. The software facilitates both interactive use and distributed batch processing. We have used it to analyze the non-coding regions around all human genes with respect to the orthologous regions in various other species including mouse. The data from these genome-wide analyzes is stored in a relational database which is used in the publicly available web services for upstream analysis and visualization of the putative cis-regulatory elements in the human genome.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Large-scale chromosome rearrangements such as copy number variants (CNVs) and inversions encompass a considerable proportion of the genetic variation between human individuals. In a number of cases, they have been closely linked with various inheritable diseases. Single-nucleotide polymorphisms (SNPs) are another large part of the genetic variance between individuals. They are also typically abundant and their measuring is straightforward and cheap. This thesis presents computational means of using SNPs to detect the presence of inversions and deletions, a particular variety of CNVs. Technically, the inversion-detection algorithm detects the suppressed recombination rate between inverted and non-inverted haplotype populations whereas the deletion-detection algorithm uses the EM-algorithm to estimate the haplotype frequencies of a window with and without a deletion haplotype. As a contribution to population biology, a coalescent simulator for simulating inversion polymorphisms has been developed. Coalescent simulation is a backward-in-time method of modelling population ancestry. Technically, the simulator also models multiple crossovers by using the Counting model as the chiasma interference model. Finally, this thesis includes an experimental section. The aforementioned methods were tested on synthetic data to evaluate their power and specificity. They were also applied to the HapMap Phase II and Phase III data sets, yielding a number of candidates for previously unknown inversions, deletions and also correctly detecting known such rearrangements.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Reuse of existing carefully designed and tested software improves the quality of new software systems and reduces their development costs. Object-oriented frameworks provide an established means for software reuse on the levels of both architectural design and concrete implementation. Unfortunately, due to frame-works complexity that typically results from their flexibility and overall abstract nature, there are severe problems in using frameworks. Patterns are generally accepted as a convenient way of documenting frameworks and their reuse interfaces. In this thesis it is argued, however, that mere static documentation is not enough to solve the problems related to framework usage. Instead, proper interactive assistance tools are needed in order to enable system-atic framework-based software production. This thesis shows how patterns that document a framework s reuse interface can be represented as dependency graphs, and how dynamic lists of programming tasks can be generated from those graphs to assist the process of using a framework to build an application. This approach to framework specialization combines the ideas of framework cookbooks and task-oriented user interfaces. Tasks provide assistance in (1) cre-ating new code that complies with the framework reuse interface specification, (2) assuring the consistency between existing code and the specification, and (3) adjusting existing code to meet the terms of the specification. Besides illustrating how task-orientation can be applied in the context of using frameworks, this thesis describes a systematic methodology for modeling any framework reuse interface in terms of software patterns based on dependency graphs. The methodology shows how framework-specific reuse interface specifi-cations can be derived from a library of existing reusable pattern hierarchies. Since the methodology focuses on reusing patterns, it also alleviates the recog-nized problem of framework reuse interface specification becoming complicated and unmanageable for frameworks of realistic size. The ideas and methods proposed in this thesis have been tested through imple-menting a framework specialization tool called JavaFrames. JavaFrames uses role-based patterns that specify a reuse interface of a framework to guide frame-work specialization in a task-oriented manner. This thesis reports the results of cases studies in which JavaFrames and the hierarchical framework reuse inter-face modeling methodology were applied to the Struts web application frame-work and the JHotDraw drawing editor framework.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Sensor networks represent an attractive tool to observe the physical world. Networks of tiny sensors can be used to detect a fire in a forest, to monitor the level of pollution in a river, or to check on the structural integrity of a bridge. Application-specific deployments of static-sensor networks have been widely investigated. Commonly, these networks involve a centralized data-collection point and no sharing of data outside the organization that owns it. Although this approach can accommodate many application scenarios, it significantly deviates from the pervasive computing vision of ubiquitous sensing where user applications seamlessly access anytime, anywhere data produced by sensors embedded in the surroundings. With the ubiquity and ever-increasing capabilities of mobile devices, urban environments can help give substance to the ubiquitous sensing vision through Urbanets, spontaneously created urban networks. Urbanets consist of mobile multi-sensor devices, such as smart phones and vehicular systems, public sensor networks deployed by municipalities, and individual sensors incorporated in buildings, roads, or daily artifacts. My thesis is that "multi-sensor mobile devices can be successfully programmed to become the underpinning elements of an open, infrastructure-less, distributed sensing platform that can bring sensor data out of their traditional close-loop networks into everyday urban applications". Urbanets can support a variety of services ranging from emergency and surveillance to tourist guidance and entertainment. For instance, cars can be used to provide traffic information services to alert drivers to upcoming traffic jams, and phones to provide shopping recommender services to inform users of special offers at the mall. Urbanets cannot be programmed using traditional distributed computing models, which assume underlying networks with functionally homogeneous nodes, stable configurations, and known delays. Conversely, Urbanets have functionally heterogeneous nodes, volatile configurations, and unknown delays. Instead, solutions developed for sensor networks and mobile ad hoc networks can be leveraged to provide novel architectures that address Urbanet-specific requirements, while providing useful abstractions that hide the network complexity from the programmer. This dissertation presents two middleware architectures that can support mobile sensing applications in Urbanets. Contory offers a declarative programming model that views Urbanets as a distributed sensor database and exposes an SQL-like interface to developers. Context-aware Migratory Services provides a client-server paradigm, where services are capable of migrating to different nodes in the network in order to maintain a continuous and semantically correct interaction with clients. Compared to previous approaches to supporting mobile sensing urban applications, our architectures are entirely distributed and do not assume constant availability of Internet connectivity. In addition, they allow on-demand collection of sensor data with the accuracy and at the frequency required by every application. These architectures have been implemented in Java and tested on smart phones. They have proved successful in supporting several prototype applications and experimental results obtained in ad hoc networks of phones have demonstrated their feasibility with reasonable performance in terms of latency, memory, and energy consumption.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The analysis of sequential data is required in many diverse areas such as telecommunications, stock market analysis, and bioinformatics. A basic problem related to the analysis of sequential data is the sequence segmentation problem. A sequence segmentation is a partition of the sequence into a number of non-overlapping segments that cover all data points, such that each segment is as homogeneous as possible. This problem can be solved optimally using a standard dynamic programming algorithm. In the first part of the thesis, we present a new approximation algorithm for the sequence segmentation problem. This algorithm has smaller running time than the optimal dynamic programming algorithm, while it has bounded approximation ratio. The basic idea is to divide the input sequence into subsequences, solve the problem optimally in each subsequence, and then appropriately combine the solutions to the subproblems into one final solution. In the second part of the thesis, we study alternative segmentation models that are devised to better fit the data. More specifically, we focus on clustered segmentations and segmentations with rearrangements. While in the standard segmentation of a multidimensional sequence all dimensions share the same segment boundaries, in a clustered segmentation the multidimensional sequence is segmented in such a way that dimensions are allowed to form clusters. Each cluster of dimensions is then segmented separately. We formally define the problem of clustered segmentations and we experimentally show that segmenting sequences using this segmentation model, leads to solutions with smaller error for the same model cost. Segmentation with rearrangements is a novel variation to the segmentation problem: in addition to partitioning the sequence we also seek to apply a limited amount of reordering, so that the overall representation error is minimized. We formulate the problem of segmentation with rearrangements and we show that it is an NP-hard problem to solve or even to approximate. We devise effective algorithms for the proposed problem, combining ideas from dynamic programming and outlier detection algorithms in sequences. In the final part of the thesis, we discuss the problem of aggregating results of segmentation algorithms on the same set of data points. In this case, we are interested in producing a partitioning of the data that agrees as much as possible with the input partitions. We show that this problem can be solved optimally in polynomial time using dynamic programming. Furthermore, we show that not all data points are candidates for segment boundaries in the optimal solution.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In recent years, XML has been widely adopted as a universal format for structured data. A variety of XML-based systems have emerged, most prominently SOAP for Web services, XMPP for instant messaging, and RSS and Atom for content syndication. This popularity is helped by the excellent support for XML processing in many programming languages and by the variety of XML-based technologies for more complex needs of applications. Concurrently with this rise of XML, there has also been a qualitative expansion of the Internet's scope. Namely, mobile devices are becoming capable enough to be full-fledged members of various distributed systems. Such devices are battery-powered, their network connections are based on wireless technologies, and their processing capabilities are typically much lower than those of stationary computers. This dissertation presents work performed to try to reconcile these two developments. XML as a highly redundant text-based format is not obviously suitable for mobile devices that need to avoid extraneous processing and communication. Furthermore, the protocols and systems commonly used in XML messaging are often designed for fixed networks and may make assumptions that do not hold in wireless environments. This work identifies four areas of improvement in XML messaging systems: the programming interfaces to the system itself and to XML processing, the serialization format used for the messages, and the protocol used to transmit the messages. We show a complete system that improves the overall performance of XML messaging through consideration of these areas. The work is centered on actually implementing the proposals in a form usable on real mobile devices. The experimentation is performed on actual devices and real networks using the messaging system implemented as a part of this work. The experimentation is extensive and, due to using several different devices, also provides a glimpse of what the performance of these systems may look like in the future.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This thesis which consists of an introduction and four peer-reviewed original publications studies the problems of haplotype inference (haplotyping) and local alignment significance. The problems studied here belong to the broad area of bioinformatics and computational biology. The presented solutions are computationally fast and accurate, which makes them practical in high-throughput sequence data analysis. Haplotype inference is a computational problem where the goal is to estimate haplotypes from a sample of genotypes as accurately as possible. This problem is important as the direct measurement of haplotypes is difficult, whereas the genotypes are easier to quantify. Haplotypes are the key-players when studying for example the genetic causes of diseases. In this thesis, three methods are presented for the haplotype inference problem referred to as HaploParser, HIT, and BACH. HaploParser is based on a combinatorial mosaic model and hierarchical parsing that together mimic recombinations and point-mutations in a biologically plausible way. In this mosaic model, the current population is assumed to be evolved from a small founder population. Thus, the haplotypes of the current population are recombinations of the (implicit) founder haplotypes with some point--mutations. HIT (Haplotype Inference Technique) uses a hidden Markov model for haplotypes and efficient algorithms are presented to learn this model from genotype data. The model structure of HIT is analogous to the mosaic model of HaploParser with founder haplotypes. Therefore, it can be seen as a probabilistic model of recombinations and point-mutations. BACH (Bayesian Context-based Haplotyping) utilizes a context tree weighting algorithm to efficiently sum over all variable-length Markov chains to evaluate the posterior probability of a haplotype configuration. Algorithms are presented that find haplotype configurations with high posterior probability. BACH is the most accurate method presented in this thesis and has comparable performance to the best available software for haplotype inference. Local alignment significance is a computational problem where one is interested in whether the local similarities in two sequences are due to the fact that the sequences are related or just by chance. Similarity of sequences is measured by their best local alignment score and from that, a p-value is computed. This p-value is the probability of picking two sequences from the null model that have as good or better best local alignment score. Local alignment significance is used routinely for example in homology searches. In this thesis, a general framework is sketched that allows one to compute a tight upper bound for the p-value of a local pairwise alignment score. Unlike the previous methods, the presented framework is not affeced by so-called edge-effects and can handle gaps (deletions and insertions) without troublesome sampling and curve fitting.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Analyzing statistical dependencies is a fundamental problem in all empirical science. Dependencies help us understand causes and effects, create new scientific theories, and invent cures to problems. Nowadays, large amounts of data is available, but efficient computational tools for analyzing the data are missing. In this research, we develop efficient algorithms for a commonly occurring search problem - searching for the statistically most significant dependency rules in binary data. We consider dependency rules of the form X->A or X->not A, where X is a set of positive-valued attributes and A is a single attribute. Such rules describe which factors either increase or decrease the probability of the consequent A. A classical example are genetic and environmental factors, which can either cause or prevent a disease. The emphasis in this research is that the discovered dependencies should be genuine - i.e. they should also hold in future data. This is an important distinction from the traditional association rules, which - in spite of their name and a similar appearance to dependency rules - do not necessarily represent statistical dependencies at all or represent only spurious connections, which occur by chance. Therefore, the principal objective is to search for the rules with statistical significance measures. Another important objective is to search for only non-redundant rules, which express the real causes of dependence, without any occasional extra factors. The extra factors do not add any new information on the dependence, but can only blur it and make it less accurate in future data. The problem is computationally very demanding, because the number of all possible rules increases exponentially with the number of attributes. In addition, neither the statistical dependency nor the statistical significance are monotonic properties, which means that the traditional pruning techniques do not work. As a solution, we first derive the mathematical basis for pruning the search space with any well-behaving statistical significance measures. The mathematical theory is complemented by a new algorithmic invention, which enables an efficient search without any heuristic restrictions. The resulting algorithm can be used to search for both positive and negative dependencies with any commonly used statistical measures, like Fisher's exact test, the chi-squared measure, mutual information, and z scores. According to our experiments, the algorithm is well-scalable, especially with Fisher's exact test. It can easily handle even the densest data sets with 10000-20000 attributes. Still, the results are globally optimal, which is a remarkable improvement over the existing solutions. In practice, this means that the user does not have to worry whether the dependencies hold in future data or if the data still contains better, but undiscovered dependencies.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The study examines various uses of computer technology in acquisition of information for visually impaired people. For this study 29 visually impaired persons took part in a survey about their experiences concerning acquisition of infomation and use of computers, especially with a screen magnification program, a speech synthesizer and a braille display. According to the responses, the evolution of computer technology offers an important possibility for visually impaired people to cope with everyday activities and interacting with the environment. Nevertheless, the functionality of assistive technology needs further development to become more usable and versatile. Since the challenges of independent observation of environment were emphasized in the survey, the study led into developing a portable text vision system called Tekstinäkö. Contrary to typical stand-alone applications, Tekstinäkö system was constructed by combining devices and programs that are readily available on consumer market. As the system operates, pictures are taken by a digital camera and instantly transmitted to a text recognition program in a laptop computer that talks out loud the text using a speech synthesizer. Visually impaired test users described that even unsure interpretations of the texts in the environment given by Tekstinäkö system are at least a welcome addition to complete perception of the environment. It became clear that even with a modest development work it is possible to bring new, useful and valuable methods to everyday life of disabled people. Unconventional production process of the system appeared to be efficient as well. Achieved results and the proposed working model offer one suggestion for giving enough attention to easily overlooked needs of the people with special abilities. ACM Computing Classification System (1998): K.4.2 Social Issues: Assistive technologies for persons with disabilities I.4.9 Image processing and computer vision: Applications Keywords: Visually impaired, computer-assisted, information, acquisition, assistive technology, computer, screen magnification program, speech synthesizer, braille display, survey, testing, text recognition, camera, text, perception, picture, environment, trasportation, guidance, independence, vision, disabled, blind, speech, synthesizer, braille, software engineering, programming, program, system, freeware, shareware, open source, Tekstinäkö, text vision, TopOCR, Autohotkey, computer engineering, computer science

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In recent years, XML has been accepted as the format of messages for several applications. Prominent examples include SOAP for Web services, XMPP for instant messaging, and RSS and Atom for content syndication. This XML usage is understandable, as the format itself is a well-accepted standard for structured data, and it has excellent support for many popular programming languages, so inventing an application-specific format no longer seems worth the effort. Simultaneously with this XML's rise to prominence there has been an upsurge in the number and capabilities of various mobile devices. These devices are connected through various wireless technologies to larger networks, and a goal of current research is to integrate them seamlessly into these networks. These two developments seem to be at odds with each other. XML as a fully text-based format takes up more processing power and network bandwidth than binary formats would, whereas the battery-powered nature of mobile devices dictates that energy, both in processing and transmitting, be utilized efficiently. This thesis presents the work we have performed to reconcile these two worlds. We present a message transfer service that we have developed to address what we have identified as the three key issues: XML processing at the application level, a more efficient XML serialization format, and the protocol used to transfer messages. Our presentation includes both a high-level architectural view of the whole message transfer service, as well as detailed descriptions of the three new components. These components consist of an API, and an associated data model, for XML processing designed for messaging applications, a binary serialization format for the data model of the API, and a message transfer protocol providing two-way messaging capability with support for client mobility. We also present relevant performance measurements for the service and its components. As a result of this work, we do not consider XML to be inherently incompatible with mobile devices. As the fixed networking world moves toward XML for interoperable data representation, so should the wireless world also do to provide a better-integrated networking infrastructure. However, the problems that XML adoption has touch all of the higher layers of application programming, so instead of concentrating simply on the serialization format we conclude that improvements need to be made in an integrated fashion in all of these layers.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Keuhkosyöpä on yleisimpiä syöpätauteja. Se jaetaan kahteen päätyyppiin: pienisoluiseen ja ei-pienisoluiseen keuhkosyöpään. Ei-pienisoluinen keuhkosyöpä jaetaan lisäksi alatyyppeihin, joista suurimmat ovat levyepiteeli-, adeno- ja suurisoluinen karsinooma. Keuhkosyövän tärkein riskitekijä on tupakointi, mutta muutkin työ- ja elinympäristön altisteet, kuten asbesti, voivat johtaa syöpään. Väitöstyössä tutkittiin kahdenlaisten keuhkosyöpäryhmien erityispiirteitä. Työssä kartoitettiin, onko löydettävissä muutoksia, jotka erottavat asbestikeuhkosyövät muista syövistä sekä luuytimeen varhaisessa vaiheessa leviävät keuhkosyövät leviämättömistä syövistä. Tutkimusten ensimmäisessä vaiheessa käytettiin mikrosirupohjaisia menetelmiä, jotka mahdollistavat jopa kaikkien geenien tarkastelun yhden kokeen avulla. Vertailevien mikrosirututkimusten avulla on mahdollista paikantaa geenejä tai kromosomialueita, joiden muutokset erottelevat ryhmät toisistaan. Asbestiin liittyvissä tutkimuksissa paikannettiin kuusi kromosomialuetta, joissa geenien kopiolukumäärän sekä ilmenemistason muutokset erottelivat potilaat altistushistorian mukaan. Riippumattomilla laboratoriomenetelmillä tehtyjen jatkoanalyysien avulla pystyttiin varmistamaan, että 19p-alueen häviämä oli yhteydessä asbestialtistukseen. Työssä osoitettiin myös, että 19p-alueen muutoksia voidaan indusoida altistamalla soluja asbestille in vitro. Tutkimuksessa saatiin lisäksi viitteitä asbestispesifisistä muutoksista signaalinvälitysreiteissä, sillä yhdessä toimivien geenien ilmentymisessä havaittiin eroja asbestille altistuneiden ja altistumattomien välillä. Vertailemalla luuytimeen syövän aikaisessa vaiheessa levinneiden ja leviämättömien keuhkoadenokarsinoomien muutosprofiileita toisiinsa, paikannettiin viisi aluetta, joilla geenien kopiolukumäärä- sekä ilmenemistason muutokset erottelivat ryhmät toisistaan. Jatkoanalyyseissä havaittiin, että 4q-alueen häviämää esiintyi adenokarsinoomien lisäksi levyepiteelikarsinoomiin, jotka olivat levinneet luuytimeen. Myös keuhkosyöpien aivometastaaseissa alue oli toistuvasti hävinnyt. Väitöstyön tutkimukset osoittavat, että vertailevien mikrosiruanalyysien avulla saadaan tietoa syöpäryhmien erityispiirteistä. Työssä saadut tulokset osoittavat, että 19p-alueen muutokset ovat tyypillisiä asbestikeuhkosyöville ja 4q-alueen muutokset luuytimeen aikaisessa vaiheessa leviäville keuhkosyöville.

Relevância:

100.00% 100.00%

Publicador: