858 resultados para Data pre-processing
Resumo:
Funded by The Scottish Government
Resumo:
Current state of the art techniques for landmine detection in ground penetrating radar (GPR) utilize statistical methods to identify characteristics of a landmine response. This research makes use of 2-D slices of data in which subsurface landmine responses have hyperbolic shapes. Various methods from the field of visual image processing are adapted to the 2-D GPR data, producing superior landmine detection results. This research goes on to develop a physics-based GPR augmentation method motivated by current advances in visual object detection. This GPR specific augmentation is used to mitigate issues caused by insufficient training sets. This work shows that augmentation improves detection performance under training conditions that are normally very difficult. Finally, this work introduces the use of convolutional neural networks as a method to learn feature extraction parameters. These learned convolutional features outperform hand-designed features in GPR detection tasks. This work presents a number of methods, both borrowed from and motivated by the substantial work in visual image processing. The methods developed and presented in this work show an improvement in overall detection performance and introduce a method to improve the robustness of statistical classification.
Resumo:
AIMS: Mutation detection accuracy has been described extensively; however, it is surprising that pre-PCR processing of formalin-fixed paraffin-embedded (FFPE) samples has not been systematically assessed in clinical context. We designed a RING trial to (i) investigate pre-PCR variability, (ii) correlate pre-PCR variation with EGFR/BRAF mutation testing accuracy and (iii) investigate causes for observed variation. METHODS: 13 molecular pathology laboratories were recruited. 104 blinded FFPE curls including engineered FFPE curls, cell-negative FFPE curls and control FFPE tissue samples were distributed to participants for pre-PCR processing and mutation detection. Follow-up analysis was performed to assess sample purity, DNA integrity and DNA quantitation. RESULTS: Rate of mutation detection failure was 11.9%. Of these failures, 80% were attributed to pre-PCR error. Significant differences in DNA yields across all samples were seen using analysis of variance (p
Resumo:
The advancement of GPS technology has made it possible to use GPS devices as orientation and navigation tools, but also as tools to track spatiotemporal information. GPS tracking data can be broadly applied in location-based services, such as spatial distribution of the economy, transportation routing and planning, traffic management and environmental control. Therefore, knowledge of how to process the data from a standard GPS device is crucial for further use. Previous studies have considered various issues of the data processing at the time. This paper, however, aims to outline a general procedure for processing GPS tracking data. The procedure is illustrated step-by-step by the processing of real-world GPS data of car movements in Borlänge in the centre of Sweden.
Resumo:
The generation of heterogeneous big data sources with ever increasing volumes, velocities and veracities over the he last few years has inspired the data science and research community to address the challenge of extracting knowledge form big data. Such a wealth of generated data across the board can be intelligently exploited to advance our knowledge about our environment, public health, critical infrastructure and security. In recent years we have developed generic approaches to process such big data at multiple levels for advancing decision-support. It specifically concerns data processing with semantic harmonisation, low level fusion, analytics, knowledge modelling with high level fusion and reasoning. Such approaches will be introduced and presented in context of the TRIDEC project results on critical oil and gas industry drilling operations and also the ongoing large eVacuate project on critical crowd behaviour detection in confined spaces.
Resumo:
Au cours des dernières décennies, l’effort sur les applications de capteurs infrarouges a largement progressé dans le monde. Mais, une certaine difficulté demeure, en ce qui concerne le fait que les objets ne sont pas assez clairs ou ne peuvent pas toujours être distingués facilement dans l’image obtenue pour la scène observée. L’amélioration de l’image infrarouge a joué un rôle important dans le développement de technologies de la vision infrarouge de l’ordinateur, le traitement de l’image et les essais non destructifs, etc. Cette thèse traite de la question des techniques d’amélioration de l’image infrarouge en deux aspects, y compris le traitement d’une seule image infrarouge dans le domaine hybride espacefréquence, et la fusion d’images infrarouges et visibles employant la technique du nonsubsampled Contourlet transformer (NSCT). La fusion d’images peut être considérée comme étant la poursuite de l’exploration du modèle d’amélioration de l’image unique infrarouge, alors qu’il combine les images infrarouges et visibles en une seule image pour représenter et améliorer toutes les informations utiles et les caractéristiques des images sources, car une seule image ne pouvait contenir tous les renseignements pertinents ou disponibles en raison de restrictions découlant de tout capteur unique de l’imagerie. Nous examinons et faisons une enquête concernant le développement de techniques d’amélioration d’images infrarouges, et ensuite nous nous consacrons à l’amélioration de l’image unique infrarouge, et nous proposons un schéma d’amélioration de domaine hybride avec une méthode d’évaluation floue de seuil amélioré, qui permet d’obtenir une qualité d’image supérieure et améliore la perception visuelle humaine. Les techniques de fusion d’images infrarouges et visibles sont établies à l’aide de la mise en oeuvre d’une mise en registre précise des images sources acquises par différents capteurs. L’algorithme SURF-RANSAC est appliqué pour la mise en registre tout au long des travaux de recherche, ce qui conduit à des images mises en registre de façon très précise et des bénéfices accrus pour le traitement de fusion. Pour les questions de fusion d’images infrarouges et visibles, une série d’approches avancées et efficaces sont proposés. Une méthode standard de fusion à base de NSCT multi-canal est présente comme référence pour les approches de fusion proposées suivantes. Une approche conjointe de fusion, impliquant l’Adaptive-Gaussian NSCT et la transformée en ondelettes (Wavelet Transform, WT) est propose, ce qui conduit à des résultats de fusion qui sont meilleurs que ceux obtenus avec les méthodes non-adaptatives générales. Une approche de fusion basée sur le NSCT employant la détection comprime (CS, compressed sensing) et de la variation totale (TV) à des coefficients d’échantillons clairsemés et effectuant la reconstruction de coefficients fusionnés de façon précise est proposée, qui obtient de bien meilleurs résultats de fusion par le biais d’une pré-amélioration de l’image infrarouge et en diminuant les informations redondantes des coefficients de fusion. Une procédure de fusion basée sur le NSCT utilisant une technique de détection rapide de rétrécissement itératif comprimé (fast iterative-shrinking compressed sensing, FISCS) est proposée pour compresser les coefficients décomposés et reconstruire les coefficients fusionnés dans le processus de fusion, qui conduit à de meilleurs résultats plus rapidement et d’une manière efficace.
Resumo:
The Data Processing Department of ISHC has developed coding forms to be used for the data to be entered into the program. The Highway Planning and Programming and the Design Departments are responsible for coding and submitting the necessary data forms to Data Processing for the noise prediction on the highway sections.
Resumo:
Discovery Driven Analysis (DDA) is a common feature of OLAP technology to analyze structured data. In essence, DDA helps analysts to discover anomalous data by highlighting 'unexpected' values in the OLAP cube. By giving indications to the analyst on what dimensions to explore, DDA speeds up the process of discovering anomalies and their causes. However, Discovery Driven Analysis (and OLAP in general) is only applicable on structured data, such as records in databases. We propose a system to extend DDA technology to semi-structured text documents, that is, text documents with a few structured data. Our system pipeline consists of two stages: first, the text part of each document is structured around user specified dimensions, using semi-PLSA algorithm; then, we adapt DDA to these fully structured documents, thus enabling DDA on text documents. We present some applications of this system in OLAP analysis and show how scalability issues are solved. Results show that our system can handle reasonable datasets of documents, in real time, without any need for pre-computation.
Resumo:
Edge-labeled graphs have proliferated rapidly over the last decade due to the increased popularity of social networks and the Semantic Web. In social networks, relationships between people are represented by edges and each edge is labeled with a semantic annotation. Hence, a huge single graph can express many different relationships between entities. The Semantic Web represents each single fragment of knowledge as a triple (subject, predicate, object), which is conceptually identical to an edge from subject to object labeled with predicates. A set of triples constitutes an edge-labeled graph on which knowledge inference is performed. Subgraph matching has been extensively used as a query language for patterns in the context of edge-labeled graphs. For example, in social networks, users can specify a subgraph matching query to find all people that have certain neighborhood relationships. Heavily used fragments of the SPARQL query language for the Semantic Web and graph queries of other graph DBMS can also be viewed as subgraph matching over large graphs. Though subgraph matching has been extensively studied as a query paradigm in the Semantic Web and in social networks, a user can get a large number of answers in response to a query. These answers can be shown to the user in accordance with an importance ranking. In this thesis proposal, we present four different scoring models along with scalable algorithms to find the top-k answers via a suite of intelligent pruning techniques. The suggested models consist of a practically important subset of the SPARQL query language augmented with some additional useful features. The first model called Substitution Importance Query (SIQ) identifies the top-k answers whose scores are calculated from matched vertices' properties in each answer in accordance with a user-specified notion of importance. The second model called Vertex Importance Query (VIQ) identifies important vertices in accordance with a user-defined scoring method that builds on top of various subgraphs articulated by the user. Approximate Importance Query (AIQ), our third model, allows partial and inexact matchings and returns top-k of them with a user-specified approximation terms and scoring functions. In the fourth model called Probabilistic Importance Query (PIQ), a query consists of several sub-blocks: one mandatory block that must be mapped and other blocks that can be opportunistically mapped. The probability is calculated from various aspects of answers such as the number of mapped blocks, vertices' properties in each block and so on and the most top-k probable answers are returned. An important distinguishing feature of our work is that we allow the user a huge amount of freedom in specifying: (i) what pattern and approximation he considers important, (ii) how to score answers - irrespective of whether they are vertices or substitution, and (iii) how to combine and aggregate scores generated by multiple patterns and/or multiple substitutions. Because so much power is given to the user, indexing is more challenging than in situations where additional restrictions are imposed on the queries the user can ask. The proposed algorithms for the first model can also be used for answering SPARQL queries with ORDER BY and LIMIT, and the method for the second model also works for SPARQL queries with GROUP BY, ORDER BY and LIMIT. We test our algorithms on multiple real-world graph databases, showing that our algorithms are far more efficient than popular triple stores.
Resumo:
This document does NOT address the issue of oxygen data quality control (either real-time or delayed mode). As a preliminary step towards that goal, this document seeks to ensure that all countries deploying floats equipped with oxygen sensors document the data and metadata related to these floats properly. We produced this document in response to action item 14 from the AST-10 meeting in Hangzhou (March 22-23, 2009). Action item 14: Denis Gilbert to work with Taiyo Kobayashi and Virginie Thierry to ensure DACs are processing oxygen data according to recommendations. If the recommendations contained herein are followed, we will end up with a more uniform set of oxygen data within the Argo data system, allowing users to begin analysing not only their own oxygen data, but also those of others, in the true spirit of Argo data sharing. Indications provided in this document are valid as of the date of writing this document. It is very likely that changes in sensors, calibrations and conversions equations will occur in the future. Please contact V. Thierry (vthierry@ifremer.fr) for any inconsistencies or missing information. A dedicated webpage on the Argo Data Management website (www) contains all information regarding Argo oxygen data management : current and previous version of this cookbook, oxygen sensor manuals, calibration sheet examples, examples of matlab code to process oxygen data, test data, etc..
Resumo:
This thesis reports on an investigation of the feasibility and usefulness of incorporating dynamic management facilities for managing sensed context data in a distributed contextaware mobile application. The investigation focuses on reducing the work required to integrate new sensed context streams in an existing context aware architecture. Current architectures require integration work for new streams and new contexts that are encountered. This means of operation is acceptable for current fixed architectures. However, as systems become more mobile the number of discoverable streams increases. Without the ability to discover and use these new streams the functionality of any given device will be limited to the streams that it knows how to decode. The integration of new streams requires that the sensed context data be understood by the current application. If the new source provides data of a type that an application currently requires then the new source should be connected to the application without any prior knowledge of the new source. If the type is similar and can be converted then this stream too should be appropriated by the application. Such applications are based on portable devices (phones, PDAs) for semi-autonomous services that use data from sensors connected to the devices, plus data exchanged with other such devices and remote servers. Such applications must handle input from a variety of sensors, refining the data locally and managing its communication from the device in volatile and unpredictable network conditions. The choice to focus on locally connected sensory input allows for the introduction of privacy and access controls. This local control can determine how the information is communicated to others. This investigation focuses on the evaluation of three approaches to sensor data management. The first system is characterised by its static management based on the pre-pended metadata. This was the reference system. Developed for a mobile system, the data was processed based on the attached metadata. The code that performed the processing was static. The second system was developed to move away from the static processing and introduce a greater freedom of handling for the data stream, this resulted in a heavy weight approach. The approach focused on pushing the processing of the data into a number of networked nodes rather than the monolithic design of the previous system. By creating a separate communication channel for the metadata it is possible to be more flexible with the amount and type of data transmitted. The final system pulled the benefits of the other systems together. By providing a small management class that would load a separate handler based on the incoming data, Dynamism was maximised whilst maintaining ease of code understanding. The three systems were then compared to highlight their ability to dynamically manage new sensed context. The evaluation took two approaches, the first is a quantitative analysis of the code to understand the complexity of the relative three systems. This was done by evaluating what changes to the system were involved for the new context. The second approach takes a qualitative view of the work required by the software engineer to reconfigure the systems to provide support for a new data stream. The evaluation highlights the various scenarios in which the three systems are most suited. There is always a trade-o↵ in the development of a system. The three approaches highlight this fact. The creation of a statically bound system can be quick to develop but may need to be completely re-written if the requirements move too far. Alternatively a highly dynamic system may be able to cope with new requirements but the developer time to create such a system may be greater than the creation of several simpler systems.
Resumo:
The CATARINA Leg1 cruise was carried out from June 22 to July 24 2012 on board the B/O Sarmiento de Gamboa, under the scientific supervision of Aida Rios (CSIC-IIM). It included the occurrence of the OVIDE hydrological section that was performed in June 2002, 2004, 2006, 2008 and 2010, as part of the CLIVAR program (name A25) ), and under the supervision of Herlé Mercier (CNRSLPO). This section begins near Lisbon (Portugal), runs through the West European Basin and the Iceland Basin, crosses the Reykjanes Ridge (300 miles north of Charlie-Gibbs Fracture Zone, and ends at Cape Hoppe (southeast tip of Greenland). The objective of this repeated hydrological section is to monitor the variability of water mass properties and main current transports in the basin, complementing the international observation array relevant for climate studies. In addition, the Labrador Sea was partly sampled (stations 101-108) between Greenland and Newfoundland, but heavy weather conditions prevented the achievement of the section south of 53°40’N. The quality of CTD data is essential to reach the first objective of the CATARINA project, i.e. to quantify the Meridional Overturning Circulation and water mass ventilation changes and their effect on the changes in the anthropogenic carbon ocean uptake and storage capacity. The CATARINA project was mainly funded by the Spanish Ministry of Sciences and Innovation and co-funded by the Fondo Europeo de Desarrollo Regional. The hydrological OVIDE section includes 95 surface-bottom stations from coast to coast, collecting profiles of temperature, salinity, oxygen and currents, spaced by 2 to 25 Nm depending on the steepness of the topography. The position of the stations closely follows that of OVIDE 2002. In addition, 8 stations were carried out in the Labrador Sea. From the 24 bottles closed at various depth at each stations, samples of sea water are used for salinity and oxygen calibration, and for measurements of biogeochemical components that are not reported here. The data were acquired with a Seabird CTD (SBE911+) and an SBE43 for the dissolved oxygen, belonging to the Spanish UTM group. The software SBE data processing was used after decoding and cleaning the raw data. Then, the LPO matlab toolbox was used to calibrate and bin the data as it was done for the previous OVIDE cruises, using on the one hand pre and post-cruise calibration results for the pressure and temperature sensors (done at Ifremer) and on the other hand the water samples of the 24 bottles of the rosette at each station for the salinity and dissolved oxygen data. A final accuracy of 0.002°C, 0.002 psu and 0.04 ml/l (2.3 umol/kg) was obtained on final profiles of temperature, salinity and dissolved oxygen, compatible with international requirements issued from the WOCE program.
Resumo:
Freeze drying technology can give good quality attributes of vegetables and fruits in terms of color, nutrition, volume, rehydration kinetics, stability during storage, among others, when compared with solely air dried ones. However, published scientific works showed that treatments applied before and after air dehydration are effective in food attributes, improving its quality. Therefore, the hypothesis of the present thesis was focus in a vast research of scientific work that showed the possibility to apply a pre-treatment and a post-treatment to food products combined with conventional air drying aiming being close, or even better, to the quality that a freeze dried product can give. Such attributes are the enzymatic inactivation, stability during storage, drying and rehydration kinetics, color, nutrition, volume and texture/structure. With regard to pre-treatments, the ones studied along the present work were: water blanching, steam blanching, ultrasound, freezing, high pressure and osmotic dehydration. High electric pulsed field was also studied but the food attributes were not explained on detailed. Basically, water and steam blanching showed to be adequate to inactivate enzymes in order to prevent enzymatic browning and preserve the product quality during long storage periods. With regard to ultrasound pre-treatment the published results pointed that ultrasound is an effective pre-treatment to reduce further drying times, improve rehydration kinetics and color retention. On the other hand, studies showed that ultrasound allow sugars losses and, in some cases, can lead to cell disruption. For freezing pre-treatment an overall conclusion was difficult to draw for some food attributes, since, each fruit or vegetable is unique and freezing comprises a lot of variables. However, for the studied cases, freezing showed to be a pre-treatment able to enhance rehydration kinetics and color attributes. High pressure pre-treatment showed to inactivate enzymes improving storage stability of food and showed to have a positive performance in terms of rehydration. For other attributes, when high pressure technology was applied, the literature showed divergent results according with the crops used. Finally, osmotic dehydration has been widely used in food processing to incorporate a desired salt or sugar present in aqueous solution into the cellular structure of food matrix (improvement of nutrition attribute). Moreover, osmotic dehydration lead to shorter drying times and the impregnation of solutes during osmose allow cellular strengthens of food. In case of post-treatments, puffing and a new technology denominated as instant controlled pressure drop (DIC) were reported in the literature as treatments able to improve diverse Abstract Effect of Pre-treatments and Post-treatments on Drying Products x food attributes. Basically, both technologies are similar where the product is submitted to a high pressure step and the process can make use of different heating mediums such as CO2, steam, air and N2. However, there exist a significant difference related with the final stage of both which can comprise the quality of the final product. On the other hand, puffing and DIC are used to expand cellular tissues improving the volume of food samples, helping in rehydration kinetics as posterior procedure, among others. The effectiveness of such pre and/or post-treatments is dependent on the state of the vegetables and fruits used which are also dependent of its cellular structure, variety, origin, state (fresh, ripe, raw), harvesting conditions, etc. In conclusion, as it was seen in the open literature, the application of pre-treatments and post-treatments coupled with a conventional air dehydration aim to give dehydrated food products with similar quality of freeze dried ones. Along the present Master thesis the experimental data was removed due to confidential reasons of the company Unilever R&D Vlaardingen
Resumo:
By providing vehicle-to-vehicle and vehicle-to-infrastructure wireless communications, vehicular ad hoc networks (VANETs), also known as the “networks on wheels”, can greatly enhance traffic safety, traffic efficiency and driving experience for intelligent transportation system (ITS). However, the unique features of VANETs, such as high mobility and uneven distribution of vehicular nodes, impose critical challenges of high efficiency and reliability for the implementation of VANETs. This dissertation is motivated by the great application potentials of VANETs in the design of efficient in-network data processing and dissemination. Considering the significance of message aggregation, data dissemination and data collection, this dissertation research targets at enhancing the traffic safety and traffic efficiency, as well as developing novel commercial applications, based on VANETs, following four aspects: 1) accurate and efficient message aggregation to detect on-road safety relevant events, 2) reliable data dissemination to reliably notify remote vehicles, 3) efficient and reliable spatial data collection from vehicular sensors, and 4) novel promising applications to exploit the commercial potentials of VANETs. Specifically, to enable cooperative detection of safety relevant events on the roads, the structure-less message aggregation (SLMA) scheme is proposed to improve communication efficiency and message accuracy. The scheme of relative position based message dissemination (RPB-MD) is proposed to reliably and efficiently disseminate messages to all intended vehicles in the zone-of-relevance in varying traffic density. Due to numerous vehicular sensor data available based on VANETs, the scheme of compressive sampling based data collection (CS-DC) is proposed to efficiently collect the spatial relevance data in a large scale, especially in the dense traffic. In addition, with novel and efficient solutions proposed for the application specific issues of data dissemination and data collection, several appealing value-added applications for VANETs are developed to exploit the commercial potentials of VANETs, namely general purpose automatic survey (GPAS), VANET-based ambient ad dissemination (VAAD) and VANET based vehicle performance monitoring and analysis (VehicleView). Thus, by improving the efficiency and reliability in in-network data processing and dissemination, including message aggregation, data dissemination and data collection, together with the development of novel promising applications, this dissertation will help push VANETs further to the stage of massive deployment.
Resumo:
Healthcare systems have assimilated information and communication technologies in order to improve the quality of healthcare and patient's experience at reduced costs. The increasing digitalization of people's health information raises however new threats regarding information security and privacy. Accidental or deliberate data breaches of health data may lead to societal pressures, embarrassment and discrimination. Information security and privacy are paramount to achieve high quality healthcare services, and further, to not harm individuals when providing care. With that in mind, we give special attention to the category of Mobile Health (mHealth) systems. That is, the use of mobile devices (e.g., mobile phones, sensors, PDAs) to support medical and public health. Such systems, have been particularly successful in developing countries, taking advantage of the flourishing mobile market and the need to expand the coverage of primary healthcare programs. Many mHealth initiatives, however, fail to address security and privacy issues. This, coupled with the lack of specific legislation for privacy and data protection in these countries, increases the risk of harm to individuals. The overall objective of this thesis is to enhance knowledge regarding the design of security and privacy technologies for mHealth systems. In particular, we deal with mHealth Data Collection Systems (MDCSs), which consists of mobile devices for collecting and reporting health-related data, replacing paper-based approaches for health surveys and surveillance. This thesis consists of publications contributing to mHealth security and privacy in various ways: with a comprehensive literature review about mHealth in Brazil; with the design of a security framework for MDCSs (SecourHealth); with the design of a MDCS (GeoHealth); with the design of Privacy Impact Assessment template for MDCSs; and with the study of ontology-based obfuscation and anonymisation functions for health data.