865 resultados para Københavns universitet. Botanisk have


Relevância:

20.00% 20.00%

Publicador:

Resumo:

The Minimum Description Length (MDL) principle is a general, well-founded theoretical formalization of statistical modeling. The most important notion of MDL is the stochastic complexity, which can be interpreted as the shortest description length of a given sample of data relative to a model class. The exact definition of the stochastic complexity has gone through several evolutionary steps. The latest instantation is based on the so-called Normalized Maximum Likelihood (NML) distribution which has been shown to possess several important theoretical properties. However, the applications of this modern version of the MDL have been quite rare because of computational complexity problems, i.e., for discrete data, the definition of NML involves an exponential sum, and in the case of continuous data, a multi-dimensional integral usually infeasible to evaluate or even approximate accurately. In this doctoral dissertation, we present mathematical techniques for computing NML efficiently for some model families involving discrete data. We also show how these techniques can be used to apply MDL in two practical applications: histogram density estimation and clustering of multi-dimensional data.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Matrix decompositions, where a given matrix is represented as a product of two other matrices, are regularly used in data mining. Most matrix decompositions have their roots in linear algebra, but the needs of data mining are not always those of linear algebra. In data mining one needs to have results that are interpretable -- and what is considered interpretable in data mining can be very different to what is considered interpretable in linear algebra. --- The purpose of this thesis is to study matrix decompositions that directly address the issue of interpretability. An example is a decomposition of binary matrices where the factor matrices are assumed to be binary and the matrix multiplication is Boolean. The restriction to binary factor matrices increases interpretability -- factor matrices are of the same type as the original matrix -- and allows the use of Boolean matrix multiplication, which is often more intuitive than normal matrix multiplication with binary matrices. Also several other decomposition methods are described, and the computational complexity of computing them is studied together with the hardness of approximating the related optimization problems. Based on these studies, algorithms for constructing the decompositions are proposed. Constructing the decompositions turns out to be computationally hard, and the proposed algorithms are mostly based on various heuristics. Nevertheless, the algorithms are shown to be capable of finding good results in empirical experiments conducted with both synthetic and real-world data.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Place identification refers to the process of analyzing sensor data in order to detect places, i.e., spatial areas that are linked with activities and associated with meanings. Place information can be used, e.g., to provide awareness cues in applications that support social interactions, to provide personalized and location-sensitive information to the user, and to support mobile user studies by providing cues about the situations the study participant has encountered. Regularities in human movement patterns make it possible to detect personally meaningful places by analyzing location traces of a user. This thesis focuses on providing system level support for place identification, as well as on algorithmic issues related to the place identification process. The move from location to place requires interactions between location sensing technologies (e.g., GPS or GSM positioning), algorithms that identify places from location data and applications and services that utilize place information. These interactions can be facilitated using a mobile platform, i.e., an application or framework that runs on a mobile phone. For the purposes of this thesis, mobile platforms automate data capture and processing and provide means for disseminating data to applications and other system components. The first contribution of the thesis is BeTelGeuse, a freely available, open source mobile platform that supports multiple runtime environments. The actual place identification process can be understood as a data analysis task where the goal is to analyze (location) measurements and to identify areas that are meaningful to the user. The second contribution of the thesis is the Dirichlet Process Clustering (DPCluster) algorithm, a novel place identification algorithm. The performance of the DPCluster algorithm is evaluated using twelve different datasets that have been collected by different users, at different locations and over different periods of time. As part of the evaluation we compare the DPCluster algorithm against other state-of-the-art place identification algorithms. The results indicate that the DPCluster algorithm provides improved generalization performance against spatial and temporal variations in location measurements.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Analyzing statistical dependencies is a fundamental problem in all empirical science. Dependencies help us understand causes and effects, create new scientific theories, and invent cures to problems. Nowadays, large amounts of data is available, but efficient computational tools for analyzing the data are missing. In this research, we develop efficient algorithms for a commonly occurring search problem - searching for the statistically most significant dependency rules in binary data. We consider dependency rules of the form X->A or X->not A, where X is a set of positive-valued attributes and A is a single attribute. Such rules describe which factors either increase or decrease the probability of the consequent A. A classical example are genetic and environmental factors, which can either cause or prevent a disease. The emphasis in this research is that the discovered dependencies should be genuine - i.e. they should also hold in future data. This is an important distinction from the traditional association rules, which - in spite of their name and a similar appearance to dependency rules - do not necessarily represent statistical dependencies at all or represent only spurious connections, which occur by chance. Therefore, the principal objective is to search for the rules with statistical significance measures. Another important objective is to search for only non-redundant rules, which express the real causes of dependence, without any occasional extra factors. The extra factors do not add any new information on the dependence, but can only blur it and make it less accurate in future data. The problem is computationally very demanding, because the number of all possible rules increases exponentially with the number of attributes. In addition, neither the statistical dependency nor the statistical significance are monotonic properties, which means that the traditional pruning techniques do not work. As a solution, we first derive the mathematical basis for pruning the search space with any well-behaving statistical significance measures. The mathematical theory is complemented by a new algorithmic invention, which enables an efficient search without any heuristic restrictions. The resulting algorithm can be used to search for both positive and negative dependencies with any commonly used statistical measures, like Fisher's exact test, the chi-squared measure, mutual information, and z scores. According to our experiments, the algorithm is well-scalable, especially with Fisher's exact test. It can easily handle even the densest data sets with 10000-20000 attributes. Still, the results are globally optimal, which is a remarkable improvement over the existing solutions. In practice, this means that the user does not have to worry whether the dependencies hold in future data or if the data still contains better, but undiscovered dependencies.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This thesis presents methods for locating and analyzing cis-regulatory DNA elements involved with the regulation of gene expression in multicellular organisms. The regulation of gene expression is carried out by the combined effort of several transcription factor proteins collectively binding the DNA on the cis-regulatory elements. Only sparse knowledge of the 'genetic code' of these elements exists today. An automatic tool for discovery of putative cis-regulatory elements could help their experimental analysis, which would result in a more detailed view of the cis-regulatory element structure and function. We have developed a computational model for the evolutionary conservation of cis-regulatory elements. The elements are modeled as evolutionarily conserved clusters of sequence-specific transcription factor binding sites. We give an efficient dynamic programming algorithm that locates the putative cis-regulatory elements and scores them according to the conservation model. A notable proportion of the high-scoring DNA sequences show transcriptional enhancer activity in transgenic mouse embryos. The conservation model includes four parameters whose optimal values are estimated with simulated annealing. With good parameter values the model discriminates well between the DNA sequences with evolutionarily conserved cis-regulatory elements and the DNA sequences that have evolved neutrally. In further inquiry, the set of highest scoring putative cis-regulatory elements were found to be sensitive to small variations in the parameter values. The statistical significance of the putative cis-regulatory elements is estimated with the Two Component Extreme Value Distribution. The p-values grade the conservation of the cis-regulatory elements above the neutral expectation. The parameter values for the distribution are estimated by simulating the neutral DNA evolution. The conservation of the transcription factor binding sites can be used in the upstream analysis of regulatory interactions. This approach may provide mechanistic insight to the transcription level data from, e.g., microarray experiments. Here we give a method to predict shared transcriptional regulators for a set of co-expressed genes. The EEL (Enhancer Element Locator) software implements the method for locating putative cis-regulatory elements. The software facilitates both interactive use and distributed batch processing. We have used it to analyze the non-coding regions around all human genes with respect to the orthologous regions in various other species including mouse. The data from these genome-wide analyzes is stored in a relational database which is used in the publicly available web services for upstream analysis and visualization of the putative cis-regulatory elements in the human genome.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In visual object detection and recognition, classifiers have two interesting characteristics: accuracy and speed. Accuracy depends on the complexity of the image features and classifier decision surfaces. Speed depends on the hardware and the computational effort required to use the features and decision surfaces. When attempts to increase accuracy lead to increases in complexity and effort, it is necessary to ask how much are we willing to pay for increased accuracy. For example, if increased computational effort implies quickly diminishing returns in accuracy, then those designing inexpensive surveillance applications cannot aim for maximum accuracy at any cost. It becomes necessary to find trade-offs between accuracy and effort. We study efficient classification of images depicting real-world objects and scenes. Classification is efficient when a classifier can be controlled so that the desired trade-off between accuracy and effort (speed) is achieved and unnecessary computations are avoided on a per input basis. A framework is proposed for understanding and modeling efficient classification of images. Classification is modeled as a tree-like process. In designing the framework, it is important to recognize what is essential and to avoid structures that are narrow in applicability. Earlier frameworks are lacking in this regard. The overall contribution is two-fold. First, the framework is presented, subjected to experiments, and shown to be satisfactory. Second, certain unconventional approaches are experimented with. This allows the separation of the essential from the conventional. To determine if the framework is satisfactory, three categories of questions are identified: trade-off optimization, classifier tree organization, and rules for delegation and confidence modeling. Questions and problems related to each category are addressed and empirical results are presented. For example, related to trade-off optimization, we address the problem of computational bottlenecks that limit the range of trade-offs. We also ask if accuracy versus effort trade-offs can be controlled after training. For another example, regarding classifier tree organization, we first consider the task of organizing a tree in a problem-specific manner. We then ask if problem-specific organization is necessary.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The paradigm of computational vision hypothesizes that any visual function -- such as the recognition of your grandparent -- can be replicated by computational processing of the visual input. What are these computations that the brain performs? What should or could they be? Working on the latter question, this dissertation takes the statistical approach, where the suitable computations are attempted to be learned from the natural visual data itself. In particular, we empirically study the computational processing that emerges from the statistical properties of the visual world and the constraints and objectives specified for the learning process. This thesis consists of an introduction and 7 peer-reviewed publications, where the purpose of the introduction is to illustrate the area of study to a reader who is not familiar with computational vision research. In the scope of the introduction, we will briefly overview the primary challenges to visual processing, as well as recall some of the current opinions on visual processing in the early visual systems of animals. Next, we describe the methodology we have used in our research, and discuss the presented results. We have included some additional remarks, speculations and conclusions to this discussion that were not featured in the original publications. We present the following results in the publications of this thesis. First, we empirically demonstrate that luminance and contrast are strongly dependent in natural images, contradicting previous theories suggesting that luminance and contrast were processed separately in natural systems due to their independence in the visual data. Second, we show that simple cell -like receptive fields of the primary visual cortex can be learned in the nonlinear contrast domain by maximization of independence. Further, we provide first-time reports of the emergence of conjunctive (corner-detecting) and subtractive (opponent orientation) processing due to nonlinear projection pursuit with simple objective functions related to sparseness and response energy optimization. Then, we show that attempting to extract independent components of nonlinear histogram statistics of a biologically plausible representation leads to projection directions that appear to differentiate between visual contexts. Such processing might be applicable for priming, \ie the selection and tuning of later visual processing. We continue by showing that a different kind of thresholded low-frequency priming can be learned and used to make object detection faster with little loss in accuracy. Finally, we show that in a computational object detection setting, nonlinearly gain-controlled visual features of medium complexity can be acquired sequentially as images are encountered and discarded. We present two online algorithms to perform this feature selection, and propose the idea that for artificial systems, some processing mechanisms could be selectable from the environment without optimizing the mechanisms themselves. In summary, this thesis explores learning visual processing on several levels. The learning can be understood as interplay of input data, model structures, learning objectives, and estimation algorithms. The presented work adds to the growing body of evidence showing that statistical methods can be used to acquire intuitively meaningful visual processing mechanisms. The work also presents some predictions and ideas regarding biological visual processing.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The mobile phone has, as a device, taken the world by storm in the past decade; from only 136 million phones globally in 1996, it is now estimated that by the end of 2008 roughly half of the worlds population will own a mobile phone. Over the years, the capabilities of the phones as well as the networks have increased tremendously, reaching the point where the devices are better called miniature computers rather than simply mobile phones. The mobile industry is currently undertaking several initiatives of developing new generations of mobile network technologies; technologies that to a large extent focus at offering ever-increasing data rates. This thesis seeks to answer the question of whether the future mobile networks in development and the future mobile services are in sync; taking a forward-looking timeframe of five to eight years into the future, will there be services that will need the high-performance new networks being planned? The question is seen to be especially pertinent in light of slower-than-expected takeoff of 3G data services. Current and future mobile services are analyzed from two viewpoints; first, looking at the gradual, evolutionary development of the services and second, through seeking to identify potential revolutionary new mobile services. With information on both current and future mobile networks as well as services, a network capability - service requirements mapping is performed to identify which services will work in which networks. Based on the analysis, it is far from certain whether the new mobile networks, especially those planned for deployment after HSPA, will be needed as soon as they are being currently roadmapped. The true service-based demand for the "beyond HSPA" technologies may be many years into the future - or, indeed, may never materialize thanks to the increasing deployment of local area wireless broadband technologies.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

With the recent increase in interest in service-oriented architectures (SOA) and Web services, developing applications with the Web services paradigm has become feasible. Web services are self-describing, platform-independent computational elements. New applications can be assembled from a set of previously created Web services, which are composed together to make a service that uses its components to perform a certain task. This is the idea of service composition. To bring service composition to a mobile phone, I have created Interactive Service Composer for mobile phones. With Interactive Service Composer, the user is able to build service compositions on his mobile phone, consisting of Web services or services that are available from the mobile phone itself. The service compositions are reusable and can be saved in the phone's memory. Previously saved compositions can also be used in new compositions. While developing applications for mobile phones has been possible for some time, the usability of the solutions is not the same as when developing for desktop computers. When developing for mobile phones, the developer has to more carefully consider the decisions he is going to make with the program he is developing. With the lack of processing power and memory, the applications cannot function as well as on desktop PCs. On the other hand, this does not remove the appeal of developing applications for mobile devices.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This thesis presents a highly sensitive genome wide search method for recessive mutations. The method is suitable for distantly related samples that are divided into phenotype positives and negatives. High throughput genotype arrays are used to identify and compare homozygous regions between the cohorts. The method is demonstrated by comparing colorectal cancer patients against unaffected references. The objective is to find homozygous regions and alleles that are more common in cancer patients. We have designed and implemented software tools to automate the data analysis from genotypes to lists of candidate genes and to their properties. The programs have been designed in respect to a pipeline architecture that allows their integration to other programs such as biological databases and copy number analysis tools. The integration of the tools is crucial as the genome wide analysis of the cohort differences produces many candidate regions not related to the studied phenotype. CohortComparator is a genotype comparison tool that detects homozygous regions and compares their loci and allele constitutions between two sets of samples. The data is visualised in chromosome specific graphs illustrating the homozygous regions and alleles of each sample. The genomic regions that may harbour recessive mutations are emphasised with different colours and a scoring scheme is given for these regions. The detection of homozygous regions, cohort comparisons and result annotations are all subjected to presumptions many of which have been parameterized in our programs. The effect of these parameters and the suitable scope of the methods have been evaluated. Samples with different resolutions can be balanced with the genotype estimates of their haplotypes and they can be used within the same study.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Portable music players have made it possible to listen to a personal collection of music in almost every situation, and they are often used during some activity to provide a stimulating audio environment. Studies have demonstrated the effects of music on the human body and mind, indicating that selecting music according to situation can, besides making the situation more enjoyable, also make humans perform better. For example, music can boost performance during physical exercises, alleviate stress and positively affect learning. We believe that people intuitively select different types of music for different situations. Based on this hypothesis, we propose a portable music player, AndroMedia, designed to provide personalised music recommendations using the user’s current context and listening habits together with other user’s situational listening patterns. We have developed a prototype that consists of a central server and a PDA client. The client uses Bluetooth sensors to acquire context information and logs user interaction to infer implicit user feedback. The user interface also allows the user to give explicit feedback. Large user interface elements facilitate touch-based usage in busy environments. The prototype provides the necessary framework for using the collected information together with other user’s listening history in a context- enhanced collaborative filtering algorithm to generate context-sensitive recommendations. The current implementation is limited to using traditional collaborative filtering algorithms. We outline the techniques required to create context-aware recommendations and present a survey on mobile context-aware music recommenders found in literature. As opposed to the explored systems, AndroMedia utilises other users’ listening habits when suggesting tunes, and does not require any laborious set up processes.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Place identification is the methodology of automatically detecting spatial regions or places that are meaningful to a user by analysing her location traces. Following this approach several algorithms have been proposed in the literature. Most of the algorithms perform well on a particular data set with suitable choice of parameter values. However, tuneable parameters make it difficult for an algorithm to generalise to data sets collected from different geographical locations, different periods of time or containing different activities. This thesis compares the generalisation performance of our proposed DPCluster algorithm along with six state-of-the-art place identification algorithms on twelve location data sets collected using Global Positioning System (GPS). Spatial and temporal variations present in the data help us to identify strengths and weaknesses of the place identification algorithms under study. We begin by discussing the notion of a place and its importance in location-aware computing. Next, we discuss different phases of the place identification process found in the literature followed by a thorough description of seven algorithms. After that, we define evaluation metrics and compare generalisation performance of individual place identification algorithms and report the results. The results indicate that the DPCluster algorithm performs superior to all other algorithms in terms of generalisation performance.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

One of the most tangled fields of research is the field of defining and modeling affective concepts, i. e. concepts regarding emotions and feelings. The subject can be approached from many disciplines. The main problem is lack of generally approved definitions. However, e.g. linguists have recently started to check the consistency of their theories with the help of computer simulations. Definitions of affective concepts are needed for performing similar simulations in behavioral sciences. In this thesis, preliminary computational definitions of affects for a simple utility-maximizing agent are given. The definitions have been produced by synthetizing ideas from theories from several fields of research. The class of affects is defined as a superclass of emotions and feelings. Affect is defined as a process, in which a change in an agent's expected utility causes a bodily change. If the process is currently under the attention of the agent (i.e. the agent is conscious of it), the process is a feeling. If it is not, but can in principle be taken into attention (i.e. it is preconscious), the process is an emotion. Thus, affects do not presuppose consciousness, but emotions and affects do. Affects directed at unexpected materialized (i.e. past) events are delight and fright. Delight is the consequence of an unexpected positive event and fright is the consequence of an unexpected negative event. Affects directed at expected materialized (i.e. past) events are happiness (expected positive event materialized), disappointment (expected positive event did not materialize), sadness (expected negative event materialized) and relief (expected negative event did not materialize). Affects directed at expected unrealized (i.e. future) events are fear and hope. Some other affects can be defined as directed towards originators of the events. The affect classification has also been implemented as a computer program, the purpose of which is to ensure the coherence of the definitions and also to illustrate the capabilities of the model. The exact content of bodily changes associated with specific affects is not considered relevant from the point of view of the logical structure of affective phenomena. The utility function need also not be defined, since the target of examination is only its dynamics.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Real-time scheduling algorithms, such as Rate Monotonic and Earliest Deadline First, guarantee that calculations are performed within a pre-defined time. As many real-time systems operate on limited battery power, these algorithms have been enhanced with power-aware properties. In this thesis, 13 power-aware real-time scheduling algorithms for processor, device and system-level use are explored.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Current smartphones have a storage capacity of several gigabytes. More and more information is stored on mobile devices. To meet the challenge of information organization, we turn to desktop search. Users often possess multiple devices, and synchronize (subsets of) information between them. This makes file synchronization more important. This thesis presents Dessy, a desktop search and synchronization framework for mobile devices. Dessy uses desktop search techniques, such as indexing, query and index term stemming, and search relevance ranking. Dessy finds files by their content, metadata, and context information. For example, PDF files may be found by their author, subject, title, or text. EXIF data of JPEG files may be used in finding them. User–defined tags can be added to files to organize and retrieve them later. Retrieved files are ranked according to their relevance to the search query. The Dessy prototype uses the BM25 ranking function, used widely in information retrieval. Dessy provides an interface for locating files for both users and applications. Dessy is closely integrated with the Syxaw file synchronizer, which provides efficient file and metadata synchronization, optimizing network usage. Dessy supports synchronization of search results, individual files, and directory trees. It allows finding and synchronizing files that reside on remote computers, or the Internet. Dessy is designed to solve the problem of efficient mobile desktop search and synchronization, also supporting remote and Internet search. Remote searches may be carried out offline using a downloaded index, or while connected to the remote machine on a weak network. To secure user data, transmissions between the Dessy client and server are encrypted using symmetric encryption. Symmetric encryption keys are exchanged with RSA key exchange. Dessy emphasizes extensibility. Also the cryptography can be extended. Users may tag their files with context tags and control custom file metadata. Adding new indexed file types, metadata fields, ranking methods, and index types is easy. Finding files is done with virtual directories, which are views into the user’s files, browseable by regular file managers. On mobile devices, the Dessy GUI provides easy access to the search and synchronization system. This thesis includes results of Dessy synchronization and search experiments, including power usage measurements. Finally, Dessy has been designed with mobility and device constraints in mind. It requires only MIDP 2.0 Mobile Java with FileConnection support, and Java 1.5 on desktop machines.