181 resultados para Datavetenskap (datalogi)


Relevância:

10.00% 10.00%

Publicador:

Resumo:

Place identification refers to the process of analyzing sensor data in order to detect places, i.e., spatial areas that are linked with activities and associated with meanings. Place information can be used, e.g., to provide awareness cues in applications that support social interactions, to provide personalized and location-sensitive information to the user, and to support mobile user studies by providing cues about the situations the study participant has encountered. Regularities in human movement patterns make it possible to detect personally meaningful places by analyzing location traces of a user. This thesis focuses on providing system level support for place identification, as well as on algorithmic issues related to the place identification process. The move from location to place requires interactions between location sensing technologies (e.g., GPS or GSM positioning), algorithms that identify places from location data and applications and services that utilize place information. These interactions can be facilitated using a mobile platform, i.e., an application or framework that runs on a mobile phone. For the purposes of this thesis, mobile platforms automate data capture and processing and provide means for disseminating data to applications and other system components. The first contribution of the thesis is BeTelGeuse, a freely available, open source mobile platform that supports multiple runtime environments. The actual place identification process can be understood as a data analysis task where the goal is to analyze (location) measurements and to identify areas that are meaningful to the user. The second contribution of the thesis is the Dirichlet Process Clustering (DPCluster) algorithm, a novel place identification algorithm. The performance of the DPCluster algorithm is evaluated using twelve different datasets that have been collected by different users, at different locations and over different periods of time. As part of the evaluation we compare the DPCluster algorithm against other state-of-the-art place identification algorithms. The results indicate that the DPCluster algorithm provides improved generalization performance against spatial and temporal variations in location measurements.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The metabolism of an organism consists of a network of biochemical reactions that transform small molecules, or metabolites, into others in order to produce energy and building blocks for essential macromolecules. The goal of metabolic flux analysis is to uncover the rates, or the fluxes, of those biochemical reactions. In a steady state, the sum of the fluxes that produce an internal metabolite is equal to the sum of the fluxes that consume the same molecule. Thus the steady state imposes linear balance constraints to the fluxes. In general, the balance constraints imposed by the steady state are not sufficient to uncover all the fluxes of a metabolic network. The fluxes through cycles and alternative pathways between the same source and target metabolites remain unknown. More information about the fluxes can be obtained from isotopic labelling experiments, where a cell population is fed with labelled nutrients, such as glucose that contains 13C atoms. Labels are then transferred by biochemical reactions to other metabolites. The relative abundances of different labelling patterns in internal metabolites depend on the fluxes of pathways producing them. Thus, the relative abundances of different labelling patterns contain information about the fluxes that cannot be uncovered from the balance constraints derived from the steady state. The field of research that estimates the fluxes utilizing the measured constraints to the relative abundances of different labelling patterns induced by 13C labelled nutrients is called 13C metabolic flux analysis. There exist two approaches of 13C metabolic flux analysis. In the optimization approach, a non-linear optimization task, where candidate fluxes are iteratively generated until they fit to the measured abundances of different labelling patterns, is constructed. In the direct approach, linear balance constraints given by the steady state are augmented with linear constraints derived from the abundances of different labelling patterns of metabolites. Thus, mathematically involved non-linear optimization methods that can get stuck to the local optima can be avoided. On the other hand, the direct approach may require more measurement data than the optimization approach to obtain the same flux information. Furthermore, the optimization framework can easily be applied regardless of the labelling measurement technology and with all network topologies. In this thesis we present a formal computational framework for direct 13C metabolic flux analysis. The aim of our study is to construct as many linear constraints to the fluxes from the 13C labelling measurements using only computational methods that avoid non-linear techniques and are independent from the type of measurement data, the labelling of external nutrients and the topology of the metabolic network. The presented framework is the first representative of the direct approach for 13C metabolic flux analysis that is free from restricting assumptions made about these parameters.In our framework, measurement data is first propagated from the measured metabolites to other metabolites. The propagation is facilitated by the flow analysis of metabolite fragments in the network. Then new linear constraints to the fluxes are derived from the propagated data by applying the techniques of linear algebra.Based on the results of the fragment flow analysis, we also present an experiment planning method that selects sets of metabolites whose relative abundances of different labelling patterns are most useful for 13C metabolic flux analysis. Furthermore, we give computational tools to process raw 13C labelling data produced by tandem mass spectrometry to a form suitable for 13C metabolic flux analysis.

Relevância:

10.00% 10.00%

Publicador:

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The TCP protocol is used by most Internet applications today, including the recent mobile wireless terminals that use TCP for their World-Wide Web, E-mail and other traffic. The recent wireless network technologies, such as GPRS, are known to cause delay spikes in packet transfer. This causes unnecessary TCP retransmission timeouts. This dissertation proposes a mechanism, Forward RTO-Recovery (F-RTO) for detecting the unnecessary TCP retransmission timeouts and thus allow TCP to take appropriate follow-up actions. We analyze a Linux F-RTO implementation in various network scenarios and investigate different alternatives to the basic algorithm. The second part of this dissertation is focused on quickly adapting the TCP's transmission rate when the underlying link characteristics change suddenly. This can happen, for example, due to vertical hand-offs between GPRS and WLAN wireless technologies. We investigate the Quick-Start algorithm that, in collaboration with the network routers, aims to quickly probe the available bandwidth on a network path, and allow TCP's congestion control algorithms to use that information. By extensive simulations we study the different router algorithms and parameters for Quick-Start, and discuss the challenges Quick-Start faces in the current Internet. We also study the performance of Quick-Start when applied to vertical hand-offs between different wireless link technologies.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Analyzing statistical dependencies is a fundamental problem in all empirical science. Dependencies help us understand causes and effects, create new scientific theories, and invent cures to problems. Nowadays, large amounts of data is available, but efficient computational tools for analyzing the data are missing. In this research, we develop efficient algorithms for a commonly occurring search problem - searching for the statistically most significant dependency rules in binary data. We consider dependency rules of the form X->A or X->not A, where X is a set of positive-valued attributes and A is a single attribute. Such rules describe which factors either increase or decrease the probability of the consequent A. A classical example are genetic and environmental factors, which can either cause or prevent a disease. The emphasis in this research is that the discovered dependencies should be genuine - i.e. they should also hold in future data. This is an important distinction from the traditional association rules, which - in spite of their name and a similar appearance to dependency rules - do not necessarily represent statistical dependencies at all or represent only spurious connections, which occur by chance. Therefore, the principal objective is to search for the rules with statistical significance measures. Another important objective is to search for only non-redundant rules, which express the real causes of dependence, without any occasional extra factors. The extra factors do not add any new information on the dependence, but can only blur it and make it less accurate in future data. The problem is computationally very demanding, because the number of all possible rules increases exponentially with the number of attributes. In addition, neither the statistical dependency nor the statistical significance are monotonic properties, which means that the traditional pruning techniques do not work. As a solution, we first derive the mathematical basis for pruning the search space with any well-behaving statistical significance measures. The mathematical theory is complemented by a new algorithmic invention, which enables an efficient search without any heuristic restrictions. The resulting algorithm can be used to search for both positive and negative dependencies with any commonly used statistical measures, like Fisher's exact test, the chi-squared measure, mutual information, and z scores. According to our experiments, the algorithm is well-scalable, especially with Fisher's exact test. It can easily handle even the densest data sets with 10000-20000 attributes. Still, the results are globally optimal, which is a remarkable improvement over the existing solutions. In practice, this means that the user does not have to worry whether the dependencies hold in future data or if the data still contains better, but undiscovered dependencies.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This thesis presents methods for locating and analyzing cis-regulatory DNA elements involved with the regulation of gene expression in multicellular organisms. The regulation of gene expression is carried out by the combined effort of several transcription factor proteins collectively binding the DNA on the cis-regulatory elements. Only sparse knowledge of the 'genetic code' of these elements exists today. An automatic tool for discovery of putative cis-regulatory elements could help their experimental analysis, which would result in a more detailed view of the cis-regulatory element structure and function. We have developed a computational model for the evolutionary conservation of cis-regulatory elements. The elements are modeled as evolutionarily conserved clusters of sequence-specific transcription factor binding sites. We give an efficient dynamic programming algorithm that locates the putative cis-regulatory elements and scores them according to the conservation model. A notable proportion of the high-scoring DNA sequences show transcriptional enhancer activity in transgenic mouse embryos. The conservation model includes four parameters whose optimal values are estimated with simulated annealing. With good parameter values the model discriminates well between the DNA sequences with evolutionarily conserved cis-regulatory elements and the DNA sequences that have evolved neutrally. In further inquiry, the set of highest scoring putative cis-regulatory elements were found to be sensitive to small variations in the parameter values. The statistical significance of the putative cis-regulatory elements is estimated with the Two Component Extreme Value Distribution. The p-values grade the conservation of the cis-regulatory elements above the neutral expectation. The parameter values for the distribution are estimated by simulating the neutral DNA evolution. The conservation of the transcription factor binding sites can be used in the upstream analysis of regulatory interactions. This approach may provide mechanistic insight to the transcription level data from, e.g., microarray experiments. Here we give a method to predict shared transcriptional regulators for a set of co-expressed genes. The EEL (Enhancer Element Locator) software implements the method for locating putative cis-regulatory elements. The software facilitates both interactive use and distributed batch processing. We have used it to analyze the non-coding regions around all human genes with respect to the orthologous regions in various other species including mouse. The data from these genome-wide analyzes is stored in a relational database which is used in the publicly available web services for upstream analysis and visualization of the putative cis-regulatory elements in the human genome.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Ubiquitous computing is about making computers and computerized artefacts a pervasive part of our everyday lifes, bringing more and more activities into the realm of information. The computationalization, informationalization of everyday activities increases not only our reach, efficiency and capabilities but also the amount and kinds of data gathered about us and our activities. In this thesis, I explore how information systems can be constructed so that they handle this personal data in a reasonable manner. The thesis provides two kinds of results: on one hand, tools and methods for both the construction as well as the evaluation of ubiquitous and mobile systems---on the other hand an evaluation of the privacy aspects of a ubiquitous social awareness system. The work emphasises real-world experiments as the most important way to study privacy. Additionally, the state of current information systems as regards data protection is studied. The tools and methods in this thesis consist of three distinct contributions. An algorithm for locationing in cellular networks is proposed that does not require the location information to be revealed beyond the user's terminal. A prototyping platform for the creation of context-aware ubiquitous applications called ContextPhone is described and released as open source. Finally, a set of methodological findings for the use of smartphones in social scientific field research is reported. A central contribution of this thesis are the pragmatic tools that allow other researchers to carry out experiments. The evaluation of the ubiquitous social awareness application ContextContacts covers both the usage of the system in general as well as an analysis of privacy implications. The usage of the system is analyzed in the light of how users make inferences of others based on real-time contextual cues mediated by the system, based on several long-term field studies. The analysis of privacy implications draws together the social psychological theory of self-presentation and research in privacy for ubiquitous computing, deriving a set of design guidelines for such systems. The main findings from these studies can be summarized as follows: The fact that ubiquitous computing systems gather more data about users can be used to not only study the use of such systems in an effort to create better systems but in general to study phenomena previously unstudied, such as the dynamic change of social networks. Systems that let people create new ways of presenting themselves to others can be fun for the users---but the self-presentation requires several thoughtful design decisions that allow the manipulation of the image mediated by the system. Finally, the growing amount of computational resources available to the users can be used to allow them to use the data themselves, rather than just being passive subjects of data gathering.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In visual object detection and recognition, classifiers have two interesting characteristics: accuracy and speed. Accuracy depends on the complexity of the image features and classifier decision surfaces. Speed depends on the hardware and the computational effort required to use the features and decision surfaces. When attempts to increase accuracy lead to increases in complexity and effort, it is necessary to ask how much are we willing to pay for increased accuracy. For example, if increased computational effort implies quickly diminishing returns in accuracy, then those designing inexpensive surveillance applications cannot aim for maximum accuracy at any cost. It becomes necessary to find trade-offs between accuracy and effort. We study efficient classification of images depicting real-world objects and scenes. Classification is efficient when a classifier can be controlled so that the desired trade-off between accuracy and effort (speed) is achieved and unnecessary computations are avoided on a per input basis. A framework is proposed for understanding and modeling efficient classification of images. Classification is modeled as a tree-like process. In designing the framework, it is important to recognize what is essential and to avoid structures that are narrow in applicability. Earlier frameworks are lacking in this regard. The overall contribution is two-fold. First, the framework is presented, subjected to experiments, and shown to be satisfactory. Second, certain unconventional approaches are experimented with. This allows the separation of the essential from the conventional. To determine if the framework is satisfactory, three categories of questions are identified: trade-off optimization, classifier tree organization, and rules for delegation and confidence modeling. Questions and problems related to each category are addressed and empirical results are presented. For example, related to trade-off optimization, we address the problem of computational bottlenecks that limit the range of trade-offs. We also ask if accuracy versus effort trade-offs can be controlled after training. For another example, regarding classifier tree organization, we first consider the task of organizing a tree in a problem-specific manner. We then ask if problem-specific organization is necessary.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Cell transition data is obtained from a cellular phone that switches its current serving cell tower. The data consists of a sequence of transition events, which are pairs of cell identifiers and transition times. The focus of this thesis is applying data mining methods to such data, developing new algorithms, and extracting knowledge that will be a solid foundation on which to build location-aware applications. In addition to a thorough exploration of the features of the data, the tools and methods developed in this thesis provide solutions to three distinct research problems. First, we develop clustering algorithms that produce a reliable mapping between cell transitions and physical locations observed by users of mobile devices. The main clustering algorithm operates in online fashion, and we consider also a number of offline clustering methods for comparison. Second, we define the concept of significant locations, known as bases, and give an online algorithm for determining them. Finally, we consider the task of predicting the movement of the user, based on historical data. We develop a prediction algorithm that considers paths of movement in their entirety, instead of just the most recent movement history. All of the presented methods are evaluated with a significant body of real cell transition data, collected from about one hundred different individuals. The algorithms developed in this thesis are designed to be implemented on a mobile device, and require no extra hardware sensors or network infrastructure. By not relying on external services and keeping the user information as much as possible on the user s own personal device, we avoid privacy issues and let the users control the disclosure of their location information.