956 resultados para Data compression (Computer science)


Relevância:

100.00% 100.00%

Publicador:

Resumo:

This thesis investigates how web search evaluation can be improved using historical interaction data. Modern search engines combine offline and online evaluation approaches in a sequence of steps that a tested change needs to pass through to be accepted as an improvement and subsequently deployed. We refer to such a sequence of steps as an evaluation pipeline. In this thesis, we consider the evaluation pipeline to contain three sequential steps: an offline evaluation step, an online evaluation scheduling step, and an online evaluation step. In this thesis we show that historical user interaction data can aid in improving the accuracy or efficiency of each of the steps of the web search evaluation pipeline. As a result of these improvements, the overall efficiency of the entire evaluation pipeline is increased. Firstly, we investigate how user interaction data can be used to build accurate offline evaluation methods for query auto-completion mechanisms. We propose a family of offline evaluation metrics for query auto-completion that represents the effort the user has to spend in order to submit their query. The parameters of our proposed metrics are trained against a set of user interactions recorded in the search engine’s query logs. From our experimental study, we observe that our proposed metrics are significantly more correlated with an online user satisfaction indicator than the metrics proposed in the existing literature. Hence, fewer changes will pass the offline evaluation step to be rejected after the online evaluation step. As a result, this would allow us to achieve a higher efficiency of the entire evaluation pipeline. Secondly, we state the problem of the optimised scheduling of online experiments. We tackle this problem by considering a greedy scheduler that prioritises the evaluation queue according to the predicted likelihood of success of a particular experiment. This predictor is trained on a set of online experiments, and uses a diverse set of features to represent an online experiment. Our study demonstrates that a higher number of successful experiments per unit of time can be achieved by deploying such a scheduler on the second step of the evaluation pipeline. Consequently, we argue that the efficiency of the evaluation pipeline can be increased. Next, to improve the efficiency of the online evaluation step, we propose the Generalised Team Draft interleaving framework. Generalised Team Draft considers both the interleaving policy (how often a particular combination of results is shown) and click scoring (how important each click is) as parameters in a data-driven optimisation of the interleaving sensitivity. Further, Generalised Team Draft is applicable beyond domains with a list-based representation of results, i.e. in domains with a grid-based representation, such as image search. Our study using datasets of interleaving experiments performed both in document and image search domains demonstrates that Generalised Team Draft achieves the highest sensitivity. A higher sensitivity indicates that the interleaving experiments can be deployed for a shorter period of time or use a smaller sample of users. Importantly, Generalised Team Draft optimises the interleaving parameters w.r.t. historical interaction data recorded in the interleaving experiments. Finally, we propose to apply the sequential testing methods to reduce the mean deployment time for the interleaving experiments. We adapt two sequential tests for the interleaving experimentation. We demonstrate that one can achieve a significant decrease in experiment duration by using such sequential testing methods. The highest efficiency is achieved by the sequential tests that adjust their stopping thresholds using historical interaction data recorded in diagnostic experiments. Our further experimental study demonstrates that cumulative gains in the online experimentation efficiency can be achieved by combining the interleaving sensitivity optimisation approaches, including Generalised Team Draft, and the sequential testing approaches. Overall, the central contributions of this thesis are the proposed approaches to improve the accuracy or efficiency of the steps of the evaluation pipeline: the offline evaluation frameworks for the query auto-completion, an approach for the optimised scheduling of online experiments, a general framework for the efficient online interleaving evaluation, and a sequential testing approach for the online search evaluation. The experiments in this thesis are based on massive real-life datasets obtained from Yandex, a leading commercial search engine. These experiments demonstrate the potential of the proposed approaches to improve the efficiency of the evaluation pipeline.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Reasoning with if-then rules –in particular, with those taking from of implications between conjunctions of attributes– is crucial in many disciplines ranging from theoretical computer science to applications. One of the most important problems regarding the rules is to remove redundancies in order to obtain equivalent implicational sets with lower size.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The European Multidisciplinary Seafloor and water-column Observatory (EMSO) European Research Infrastructure Consortium (ERIC) provides power, communications, sensors, and data infrastructure for continuous, high-resolution, (near-)real-time, interactive ocean observations across a multidisciplinary and interdisciplinary range of research areas including biology, geology, chemistry, physics, engineering, and computer science, from polar to subtropical environments, through the water column down to the abyss. Eleven deep-sea and four shallow nodes span from the Arctic through the Atlantic and Mediterranean, to the Black Sea. Coordination among the consortium nodes is being strengthened through the EMSOdev project (H2020), which will produce the EMSO Generic Instrument Module (EGIM). Early installations are now being upgraded, for example, at the Ligurian, Ionian, Azores, and Porcupine Abyssal Plain (PAP) nodes. Significant findings have been flowing in over the years; for example, high-frequency surface and subsurface water-column measurements of the PAP node show an increase in seawater pCO2 (from 339 μatm in 2003 to 353 μatm in 2011) with little variability in the mean air-sea CO2 flux. In the Central Eastern Atlantic, the Oceanic Platform of the Canary Islands open-ocean canary node (aka ESTOC station) has a long-standing time series on water column physical, biogeochemical, and acidification processes that have contributed to the assessment efforts of the Intergovernmental Panel on Climate Change (IPCC). EMSO not only brings together countries and disciplines but also allows the pooling of resources and coordination to assemble harmonized data into a comprehensive regional ocean picture, which will then be made available to researchers and stakeholders worldwide on an open and interoperable access basis.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The goal of this study is to provide a framework for future researchers to understand and use the FARSITE wildfire-forecasting model with data assimilation. Current wildfire models lack the ability to provide accurate prediction of fire front position faster than real-time. When FARSITE is coupled with a recursive ensemble filter, the data assimilation forecast method improves. The scope includes an explanation of the standalone FARSITE application, technical details on FARSITE integration with a parallel program coupler called OpenPALM, and a model demonstration of the FARSITE-Ensemble Kalman Filter software using the FireFlux I experiment by Craig Clements. The results show that the fire front forecast is improved with the proposed data-driven methodology than with the standalone FARSITE model.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

International audience

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Nowadays, new computers generation provides a high performance that enables to build computationally expensive computer vision applications applied to mobile robotics. Building a map of the environment is a common task of a robot and is an essential part to allow the robots to move through these environments. Traditionally, mobile robots used a combination of several sensors from different technologies. Lasers, sonars and contact sensors have been typically used in any mobile robotic architecture, however color cameras are an important sensor due to we want the robots to use the same information that humans to sense and move through the different environments. Color cameras are cheap and flexible but a lot of work need to be done to give robots enough visual understanding of the scenes. Computer vision algorithms are computational complex problems but nowadays robots have access to different and powerful architectures that can be used for mobile robotics purposes. The advent of low-cost RGB-D sensors like Microsoft Kinect which provide 3D colored point clouds at high frame rates made the computer vision even more relevant in the mobile robotics field. The combination of visual and 3D data allows the systems to use both computer vision and 3D processing and therefore to be aware of more details of the surrounding environment. The research described in this thesis was motivated by the need of scene mapping. Being aware of the surrounding environment is a key feature in many mobile robotics applications from simple robotic navigation to complex surveillance applications. In addition, the acquisition of a 3D model of the scenes is useful in many areas as video games scene modeling where well-known places are reconstructed and added to game systems or advertising where once you get the 3D model of one room the system can add furniture pieces using augmented reality techniques. In this thesis we perform an experimental study of the state-of-the-art registration methods to find which one fits better to our scene mapping purposes. Different methods are tested and analyzed on different scene distributions of visual and geometry appearance. In addition, this thesis proposes two methods for 3d data compression and representation of 3D maps. Our 3D representation proposal is based on the use of Growing Neural Gas (GNG) method. This Self-Organizing Maps (SOMs) has been successfully used for clustering, pattern recognition and topology representation of various kind of data. Until now, Self-Organizing Maps have been primarily computed offline and their application in 3D data has mainly focused on free noise models without considering time constraints. Self-organising neural models have the ability to provide a good representation of the input space. In particular, the Growing Neural Gas (GNG) is a suitable model because of its flexibility, rapid adaptation and excellent quality of representation. However, this type of learning is time consuming, specially for high-dimensional input data. Since real applications often work under time constraints, it is necessary to adapt the learning process in order to complete it in a predefined time. This thesis proposes a hardware implementation leveraging the computing power of modern GPUs which takes advantage of a new paradigm coined as General-Purpose Computing on Graphics Processing Units (GPGPU). Our proposed geometrical 3D compression method seeks to reduce the 3D information using plane detection as basic structure to compress the data. This is due to our target environments are man-made and therefore there are a lot of points that belong to a plane surface. Our proposed method is able to get good compression results in those man-made scenarios. The detected and compressed planes can be also used in other applications as surface reconstruction or plane-based registration algorithms. Finally, we have also demonstrated the goodness of the GPU technologies getting a high performance implementation of a CAD/CAM common technique called Virtual Digitizing.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Part 14: Interoperability and Integration

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Inter-subject parcellation of functional Magnetic Resonance Imaging (fMRI) data based on a standard General Linear Model (GLM) and spectral clustering was recently proposed as a means to alleviate the issues associated with spatial normalization in fMRI. However, for all its appeal, a GLM-based parcellation approach introduces its own biases, in the form of a priori knowledge about the shape of Hemodynamic Response Function (HRF) and task-related signal changes, or about the subject behaviour during the task. In this paper, we introduce a data-driven version of the spectral clustering parcellation, based on Independent Component Analysis (ICA) and Partial Least Squares (PLS) instead of the GLM. First, a number of independent components are automatically selected. Seed voxels are then obtained from the associated ICA maps and we compute the PLS latent variables between the fMRI signal of the seed voxels (which covers regional variations of the HRF) and the principal components of the signal across all voxels. Finally, we parcellate all subjects data with a spectral clustering of the PLS latent variables. We present results of the application of the proposed method on both single-subject and multi-subject fMRI datasets. Preliminary experimental results, evaluated with intra-parcel variance of GLM t-values and PLS derived t-values, indicate that this data-driven approach offers improvement in terms of parcellation accuracy over GLM based techniques.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Sequence problems belong to the most challenging interdisciplinary topics of the actuality. They are ubiquitous in science and daily life and occur, for example, in form of DNA sequences encoding all information of an organism, as a text (natural or formal) or in form of a computer program. Therefore, sequence problems occur in many variations in computational biology (drug development), coding theory, data compression, quantitative and computational linguistics (e.g. machine translation). In recent years appeared some proposals to formulate sequence problems like the closest string problem (CSP) and the farthest string problem (FSP) as an Integer Linear Programming Problem (ILPP). In the present talk we present a general novel approach to reduce the size of the ILPP by grouping isomorphous columns of the string matrix together. The approach is of practical use, since the solution of sequence problems is very time consuming, in particular when the sequences are long.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

There is a growing societal need to address the increasing prevalence of behavioral health issues, such as obesity, alcohol or drug use, and general lack of treatment adherence for a variety of health problems. The statistics, worldwide and in the USA, are daunting. Excessive alcohol use is the third leading preventable cause of death in the United States (with 79,000 deaths annually), and is responsible for a wide range of health and social problems. On the positive side though, these behavioral health issues (and associated possible diseases) can often be prevented with relatively simple lifestyle changes, such as losing weight with a diet and/or physical exercise, or learning how to reduce alcohol consumption. Medicine has therefore started to move toward finding ways of preventively promoting wellness, rather than solely treating already established illness.^ Evidence-based patient-centered Brief Motivational Interviewing (BMI) interventions have been found particularly effective in helping people find intrinsic motivation to change problem behaviors after short counseling sessions, and to maintain healthy lifestyles over the long-term. Lack of locally available personnel well-trained in BMI, however, often limits access to successful interventions for people in need. To fill this accessibility gap, Computer-Based Interventions (CBIs) have started to emerge. Success of the CBIs, however, critically relies on insuring engagement and retention of CBI users so that they remain motivated to use these systems and come back to use them over the long term as necessary.^ Because of their text-only interfaces, current CBIs can therefore only express limited empathy and rapport, which are the most important factors of health interventions. Fortunately, in the last decade, computer science research has progressed in the design of simulated human characters with anthropomorphic communicative abilities. Virtual characters interact using humans’ innate communication modalities, such as facial expressions, body language, speech, and natural language understanding. By advancing research in Artificial Intelligence (AI), we can improve the ability of artificial agents to help us solve CBI problems.^ To facilitate successful communication and social interaction between artificial agents and human partners, it is essential that aspects of human social behavior, especially empathy and rapport, be considered when designing human-computer interfaces. Hence, the goal of the present dissertation is to provide a computational model of rapport to enhance an artificial agent’s social behavior, and to provide an experimental tool for the psychological theories shaping the model. Parts of this thesis were already published in [LYL+12, AYL12, AL13, ALYR13, LAYR13, YALR13, ALY14].^

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In order to reduce serious health incidents, individuals with high risks need to be identified as early as possible so that effective intervention and preventive care can be provided. This requires regular and efficient assessments of risk within communities that are the first point of contacts for individuals. Clinical Decision Support Systems CDSSs have been developed to help with the task of risk assessment, however such systems and their underpinning classification models are tailored towards those with clinical expertise. Communities where regular risk assessments are required lack such expertise. This paper presents the continuation of GRiST research team efforts to disseminate clinical expertise to communities. Based on our earlier published findings, this paper introduces the framework and skeleton for a data collection and risk classification model that evaluates data redundancy in real-time, detects the risk-informative data and guides the risk assessors towards collecting those data. By doing so, it enables non-experts within the communities to conduct reliable Mental Health risk triage.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The phosphatidylinositide 3-kinases (PI3K) and mammalian target of rapamycin-1 (mTOR1) are two key targets for anti-cancer therapy. Predicting the response of the PI3K/AKT/mTOR1 signalling pathway to targeted therapy is made difficult because of network complexities. Systems biology models can help explore those complexities but the value of such models is dependent on accurate parameterisation. Motivated by a need to increase accuracy in kinetic parameter estimation, and therefore the predictive power of the model, we present a framework to integrate kinetic data from enzyme assays into a unified enzyme kinetic model. We present exemplar kinetic models of PI3K and mTOR1, calibrated on in vitro enzyme data and founded on Michaelis-Menten (MM) approximation. We describe the effects of an allosteric mTOR1 inhibitor (Rapamycin) and ATP-competitive inhibitors (BEZ2235 and LY294002) that show dual inhibition of mTOR1 and PI3K. We also model the kinetics of phosphatase and tensin homolog (PTEN), which modulates sensitivity of the PI3K/AKT/mTOR1 pathway to these drugs. Model validation with independent data sets allows investigation of enzyme function and drug dose dependencies in a wide range of experimental conditions. Modelling of the mTOR1 kinetics showed that Rapamycin has an IC50 independent of ATP concentration and that it is a selective inhibitor of mTOR1 substrates S6K1 and 4EBP1: it retains 40% of mTOR1 activity relative to 4EBP1 phosphorylation and inhibits completely S6K1 activity. For the dual ATP-competitive inhibitors of mTOR1 and PI3K, LY294002 and BEZ235, we derived the dependence of the IC50 on ATP concentration that allows prediction of the IC50 at different ATP concentrations in enzyme and cellular assays. Comparison of the drug effectiveness in enzyme and cellular assays showed that some features of these drugs arise from signalling modulation beyond the on-target action and MM approximation and require a systems-level consideration of the whole PI3K/PTEN/AKT/mTOR1 network in order to understand mechanisms of drug sensitivity and resistance in different cancer cell lines. We suggest that using these models in systems biology investigation of the PI3K/AKT/mTOR1 signalling in cancer cells can bridge the gap between direct drug target action and the therapeutic response to these drugs and their combinations.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Clustering data streams is an important task in data mining research. Recently, some algorithms have been proposed to cluster data streams as a whole, but just few of them deal with multivariate data streams. Even so, these algorithms merely aggregate the attributes without touching upon the correlation among them. In order to overcome this issue, we propose a new framework to cluster multivariate data streams based on their evolving behavior over time, exploring the correlations among their attributes by computing the fractal dimension. Experimental results with climate data streams show that the clusters' quality and compactness can be improved compared to the competing method, leading to the thoughtfulness that attributes correlations cannot be put aside. In fact, the clusters' compactness are 7 to 25 times better using our method. Our framework also proves to be an useful tool to assist meteorologists in understanding the climate behavior along a period of time.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The fourth industrial revolution, also known as Industry 4.0, has rapidly gained traction in businesses across Europe and the world, becoming a central theme in small, medium, and large enterprises alike. This new paradigm shifts the focus from locally-based and barely automated firms to a globally interconnected industrial sector, stimulating economic growth and productivity, and supporting the upskilling and reskilling of employees. However, despite the maturity and scalability of information and cloud technologies, the support systems already present in the machine field are often outdated and lack the necessary security, access control, and advanced communication capabilities. This dissertation proposes architectures and technologies designed to bridge the gap between Operational and Information Technology, in a manner that is non-disruptive, efficient, and scalable. The proposal presents cloud-enabled data-gathering architectures that make use of the newest IT and networking technologies to achieve the desired quality of service and non-functional properties. By harnessing industrial and business data, processes can be optimized even before product sale, while the integrated environment enhances data exchange for post-sale support. The architectures have been tested and have shown encouraging performance results, providing a promising solution for companies looking to embrace Industry 4.0, enhance their operational capabilities, and prepare themselves for the upcoming fifth human-centric revolution.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In recent years, IoT technology has radically transformed many crucial industrial and service sectors such as healthcare. The multi-facets heterogeneity of the devices and the collected information provides important opportunities to develop innovative systems and services. However, the ubiquitous presence of data silos and the poor semantic interoperability in the IoT landscape constitute a significant obstacle in the pursuit of this goal. Moreover, achieving actionable knowledge from the collected data requires IoT information sources to be analysed using appropriate artificial intelligence techniques such as automated reasoning. In this thesis work, Semantic Web technologies have been investigated as an approach to address both the data integration and reasoning aspect in modern IoT systems. In particular, the contributions presented in this thesis are the following: (1) the IoT Fitness Ontology, an OWL ontology that has been developed in order to overcome the issue of data silos and enable semantic interoperability in the IoT fitness domain; (2) a Linked Open Data web portal for collecting and sharing IoT health datasets with the research community; (3) a novel methodology for embedding knowledge in rule-defined IoT smart home scenarios; and (4) a knowledge-based IoT home automation system that supports a seamless integration of heterogeneous devices and data sources.