838 resultados para text and data mining


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Background: The impact of cancer upon children, teenagers and young people can be profound. Research has been undertaken to explore the impacts upon children, teenagers and young people with cancer, but little is known about how researchers can ‘best’ engage with this group to explore their experiences. This review paper provides an overview of the utility of data collection methods employed when undertaking research with children, teenagers and young people. A systematic review of relevant databases was undertaken utilising the search terms ‘young people’, ‘young adult’, ‘adolescent’ anddata collection methods’. The full-text of the papers that were deemed eligible from the title and abstract were accessed and following discussion within the research team, thirty papers were included. Findings: Due to the heterogeneity in terms of the scope of the papers identified the following data collections methods were included in the results section. Three of the papers identified provided an overview of data collection methods utilised with this population and the remaining twenty seven papers covered the following data collection methods: Digital technologies; art based research; comparing the use of ‘paper and pencil’ research with web-based technologies, the use of games; the use of a specific communication tool; questionnaires and interviews; focus groups and telephone interviews/questionnaires. The strengths and limitations of the range of data collection methods included are discussed drawing upon such issues as of the appropriateness of particular methods for particular age groups, or the most appropriate method to employ when exploring a particularly sensitive topic area. Conclusions: There are a number of data collection methods utilised to undertaken research with children, teenagers and young adults. This review provides a summary of the current available evidence and an overview of the strengths and limitations of data collection methods employed.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

C3S2E '16 Proceedings of the Ninth International C* Conference on Computer Science & Software Engineering

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This document presents catalogue techniques used at network GDAC level to facilitate the discovery of platforms and data files. Some AtlantOS networks are organized as DAC-GDACs that continuously update a catalogue of metadata on observation datasets and platforms: • A DAC is a Data Assembly Centre operating at national or regional scale. It manages data and metadata for its area with a direct link to Scientifics and Operators. The DAC pushes observations to the network GDAC. • A GDAC is a Global Data Assembly Centre. It is designed for a global observation network such as Argo, OceanSITES, DBCP, EGO, Gosud, etc… The GDAC aggregates data and metadata of an observation network, in real-time and delayed mode, provided by DACs.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Oysters play an important role in estuarine and coastal marine habitats, where the majority of humans live. In these ecosystems, environmental degradation is substantial, and oysters must cope with highly dynamic and stressful environmental constraints during their lives in the intertidal zone. The availability of the genome sequence of the Pacific oyster Crassostrea gigas represents a unique opportunity for a comprehensive assessment of the signal transduction pathways that the species has developed to deal with this unique habitat. We performed an in silico analysis to identify, annotate and classify protein kinases in C. gigas, according to their kinase domain taxonomy classification, and compared with kinome already described in other animal species. The C. gigas kinome consists of 371 protein kinases, making it closely related to the sea urchin kinome, which has 353 protein kinases. The absence of gene redundancy in some groups of the C. gigas kinome may simplify functional studies of protein kinases. Through data mining of transcriptomes in C. gigas, we identified part of the kinome which may be central during development and may play a role in response to various environmental factors. Overall, this work contributes to a better understanding of key sensing pathways that may be central for adaptation to a highly dynamic marine environment.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The European Multidisciplinary Seafloor and water-column Observatory (EMSO) European Research Infrastructure Consortium (ERIC) provides power, communications, sensors, and data infrastructure for continuous, high-resolution, (near-)real-time, interactive ocean observations across a multidisciplinary and interdisciplinary range of research areas including biology, geology, chemistry, physics, engineering, and computer science, from polar to subtropical environments, through the water column down to the abyss. Eleven deep-sea and four shallow nodes span from the Arctic through the Atlantic and Mediterranean, to the Black Sea. Coordination among the consortium nodes is being strengthened through the EMSOdev project (H2020), which will produce the EMSO Generic Instrument Module (EGIM). Early installations are now being upgraded, for example, at the Ligurian, Ionian, Azores, and Porcupine Abyssal Plain (PAP) nodes. Significant findings have been flowing in over the years; for example, high-frequency surface and subsurface water-column measurements of the PAP node show an increase in seawater pCO2 (from 339 μatm in 2003 to 353 μatm in 2011) with little variability in the mean air-sea CO2 flux. In the Central Eastern Atlantic, the Oceanic Platform of the Canary Islands open-ocean canary node (aka ESTOC station) has a long-standing time series on water column physical, biogeochemical, and acidification processes that have contributed to the assessment efforts of the Intergovernmental Panel on Climate Change (IPCC). EMSO not only brings together countries and disciplines but also allows the pooling of resources and coordination to assemble harmonized data into a comprehensive regional ocean picture, which will then be made available to researchers and stakeholders worldwide on an open and interoperable access basis.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Every Argo data file submitted by a DAC for distribution on the GDAC has its format and data consistency checked by the Argo FileChecker. Two types of checks are applied: 1. Format checks. Ensures the file formats match the Argo standards precisely. 2. Data consistency checks. Additional data consistency checks are performed on a file after it passes the format checks. These checks do not duplicate any of the quality control checks performed elsewhere. These checks can be thought of as “sanity checks” to ensure that the data are consistent with each other. The data consistency checks enforce data standards and ensure that certain data values are reasonable and/or consistent with other information in the files. Examples of the “data standard” checks are the “mandatory parameters” defined for meta-data files and the technical parameter names in technical data files. Files with format or consistency errors are rejected by the GDAC and are not distributed. Less serious problems will generate warnings and the file will still be distributed on the GDAC. Reference Tables and Data Standards: Many of the consistency checks involve comparing the data to the published reference tables and data standards. These tables are documented in the User’s Manual. (The FileChecker implements “text versions” of these tables.)

Relevância:

100.00% 100.00%

Publicador:

Resumo:

By providing vehicle-to-vehicle and vehicle-to-infrastructure wireless communications, vehicular ad hoc networks (VANETs), also known as the “networks on wheels”, can greatly enhance traffic safety, traffic efficiency and driving experience for intelligent transportation system (ITS). However, the unique features of VANETs, such as high mobility and uneven distribution of vehicular nodes, impose critical challenges of high efficiency and reliability for the implementation of VANETs. This dissertation is motivated by the great application potentials of VANETs in the design of efficient in-network data processing and dissemination. Considering the significance of message aggregation, data dissemination and data collection, this dissertation research targets at enhancing the traffic safety and traffic efficiency, as well as developing novel commercial applications, based on VANETs, following four aspects: 1) accurate and efficient message aggregation to detect on-road safety relevant events, 2) reliable data dissemination to reliably notify remote vehicles, 3) efficient and reliable spatial data collection from vehicular sensors, and 4) novel promising applications to exploit the commercial potentials of VANETs. Specifically, to enable cooperative detection of safety relevant events on the roads, the structure-less message aggregation (SLMA) scheme is proposed to improve communication efficiency and message accuracy. The scheme of relative position based message dissemination (RPB-MD) is proposed to reliably and efficiently disseminate messages to all intended vehicles in the zone-of-relevance in varying traffic density. Due to numerous vehicular sensor data available based on VANETs, the scheme of compressive sampling based data collection (CS-DC) is proposed to efficiently collect the spatial relevance data in a large scale, especially in the dense traffic. In addition, with novel and efficient solutions proposed for the application specific issues of data dissemination and data collection, several appealing value-added applications for VANETs are developed to exploit the commercial potentials of VANETs, namely general purpose automatic survey (GPAS), VANET-based ambient ad dissemination (VAAD) and VANET based vehicle performance monitoring and analysis (VehicleView). Thus, by improving the efficiency and reliability in in-network data processing and dissemination, including message aggregation, data dissemination and data collection, together with the development of novel promising applications, this dissertation will help push VANETs further to the stage of massive deployment.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

To analyze the characteristics and predict the dynamic behaviors of complex systems over time, comprehensive research to enable the development of systems that can intelligently adapt to the evolving conditions and infer new knowledge with algorithms that are not predesigned is crucially needed. This dissertation research studies the integration of the techniques and methodologies resulted from the fields of pattern recognition, intelligent agents, artificial immune systems, and distributed computing platforms, to create technologies that can more accurately describe and control the dynamics of real-world complex systems. The need for such technologies is emerging in manufacturing, transportation, hazard mitigation, weather and climate prediction, homeland security, and emergency response. Motivated by the ability of mobile agents to dynamically incorporate additional computational and control algorithms into executing applications, mobile agent technology is employed in this research for the adaptive sensing and monitoring in a wireless sensor network. Mobile agents are software components that can travel from one computing platform to another in a network and carry programs and data states that are needed for performing the assigned tasks. To support the generation, migration, communication, and management of mobile monitoring agents, an embeddable mobile agent system (Mobile-C) is integrated with sensor nodes. Mobile monitoring agents visit distributed sensor nodes, read real-time sensor data, and perform anomaly detection using the equipped pattern recognition algorithms. The optimal control of agents is achieved by mimicking the adaptive immune response and the application of multi-objective optimization algorithms. The mobile agent approach provides potential to reduce the communication load and energy consumption in monitoring networks. The major research work of this dissertation project includes: (1) studying effective feature extraction methods for time series measurement data; (2) investigating the impact of the feature extraction methods and dissimilarity measures on the performance of pattern recognition; (3) researching the effects of environmental factors on the performance of pattern recognition; (4) integrating an embeddable mobile agent system with wireless sensor nodes; (5) optimizing agent generation and distribution using artificial immune system concept and multi-objective algorithms; (6) applying mobile agent technology and pattern recognition algorithms for adaptive structural health monitoring and driving cycle pattern recognition; (7) developing a web-based monitoring network to enable the visualization and analysis of real-time sensor data remotely. Techniques and algorithms developed in this dissertation project will contribute to research advances in networked distributed systems operating under changing environments.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

With the exponential growth of the usage of web-based map services, the web GIS application has become more and more popular. Spatial data index, search, analysis, visualization and the resource management of such services are becoming increasingly important to deliver user-desired Quality of Service. First, spatial indexing is typically time-consuming and is not available to end-users. To address this, we introduce TerraFly sksOpen, an open-sourced an Online Indexing and Querying System for Big Geospatial Data. Integrated with the TerraFly Geospatial database [1-9], sksOpen is an efficient indexing and query engine for processing Top-k Spatial Boolean Queries. Further, we provide ergonomic visualization of query results on interactive maps to facilitate the user’s data analysis. Second, due to the highly complex and dynamic nature of GIS systems, it is quite challenging for the end users to quickly understand and analyze the spatial data, and to efficiently share their own data and analysis results with others. Built on the TerraFly Geo spatial database, TerraFly GeoCloud is an extra layer running upon the TerraFly map and can efficiently support many different visualization functions and spatial data analysis models. Furthermore, users can create unique URLs to visualize and share the analysis results. TerraFly GeoCloud also enables the MapQL technology to customize map visualization using SQL-like statements [10]. Third, map systems often serve dynamic web workloads and involve multiple CPU and I/O intensive tiers, which make it challenging to meet the response time targets of map requests while using the resources efficiently. Virtualization facilitates the deployment of web map services and improves their resource utilization through encapsulation and consolidation. Autonomic resource management allows resources to be automatically provisioned to a map service and its internal tiers on demand. v-TerraFly are techniques to predict the demand of map workloads online and optimize resource allocations, considering both response time and data freshness as the QoS target. The proposed v-TerraFly system is prototyped on TerraFly, a production web map service, and evaluated using real TerraFly workloads. The results show that v-TerraFly can accurately predict the workload demands: 18.91% more accurate; and efficiently allocate resources to meet the QoS target: improves the QoS by 26.19% and saves resource usages by 20.83% compared to traditional peak load-based resource allocation.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Ensemble Stream Modeling and Data-cleaning are sensor information processing systems have different training and testing methods by which their goals are cross-validated. This research examines a mechanism, which seeks to extract novel patterns by generating ensembles from data. The main goal of label-less stream processing is to process the sensed events to eliminate the noises that are uncorrelated, and choose the most likely model without over fitting thus obtaining higher model confidence. Higher quality streams can be realized by combining many short streams into an ensemble which has the desired quality. The framework for the investigation is an existing data mining tool. First, to accommodate feature extraction such as a bush or natural forest-fire event we make an assumption of the burnt area (BA*), sensed ground truth as our target variable obtained from logs. Even though this is an obvious model choice the results are disappointing. The reasons for this are two: One, the histogram of fire activity is highly skewed. Two, the measured sensor parameters are highly correlated. Since using non descriptive features does not yield good results, we resort to temporal features. By doing so we carefully eliminate the averaging effects; the resulting histogram is more satisfactory and conceptual knowledge is learned from sensor streams. Second is the process of feature induction by cross-validating attributes with single or multi-target variables to minimize training error. We use F-measure score, which combines precision and accuracy to determine the false alarm rate of fire events. The multi-target data-cleaning trees use information purity of the target leaf-nodes to learn higher order features. A sensitive variance measure such as f-test is performed during each node’s split to select the best attribute. Ensemble stream model approach proved to improve when using complicated features with a simpler tree classifier. The ensemble framework for data-cleaning and the enhancements to quantify quality of fitness (30% spatial, 10% temporal, and 90% mobility reduction) of sensor led to the formation of streams for sensor-enabled applications. Which further motivates the novelty of stream quality labeling and its importance in solving vast amounts of real-time mobile streams generated today.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Healthcare systems have assimilated information and communication technologies in order to improve the quality of healthcare and patient's experience at reduced costs. The increasing digitalization of people's health information raises however new threats regarding information security and privacy. Accidental or deliberate data breaches of health data may lead to societal pressures, embarrassment and discrimination. Information security and privacy are paramount to achieve high quality healthcare services, and further, to not harm individuals when providing care. With that in mind, we give special attention to the category of Mobile Health (mHealth) systems. That is, the use of mobile devices (e.g., mobile phones, sensors, PDAs) to support medical and public health. Such systems, have been particularly successful in developing countries, taking advantage of the flourishing mobile market and the need to expand the coverage of primary healthcare programs. Many mHealth initiatives, however, fail to address security and privacy issues. This, coupled with the lack of specific legislation for privacy and data protection in these countries, increases the risk of harm to individuals. The overall objective of this thesis is to enhance knowledge regarding the design of security and privacy technologies for mHealth systems. In particular, we deal with mHealth Data Collection Systems (MDCSs), which consists of mobile devices for collecting and reporting health-related data, replacing paper-based approaches for health surveys and surveillance. This thesis consists of publications contributing to mHealth security and privacy in various ways: with a comprehensive literature review about mHealth in Brazil; with the design of a security framework for MDCSs (SecourHealth); with the design of a MDCS (GeoHealth); with the design of Privacy Impact Assessment template for MDCSs; and with the study of ontology-based obfuscation and anonymisation functions for health data.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A global italian pharmaceutical company has to provide two work environments that favor different needs. The environments will allow to develop solutions in a controlled, secure and at the same time in an independent manner on a state-of-the-art enterprise cloud platform. The need of developing two different environments is dictated by the needs of the working units. Indeed, the first environment is designed to facilitate the creation of application related to genomics, therefore, designed more for data-scientists. This environment is capable of consuming, producing, retrieving and incorporating data, furthermore, will support the most used programming languages for genomic applications (e.g., Python, R). The proposal was to obtain a pool of ready-togo Virtual Machines with different architectures to provide best performance based on the job that needs to be carried out. The second environment has more of a traditional trait, to obtain, via ETL (Extract-Transform-Load) process, a global datamodel, resembling a classical relational structure. It will provide major BI operations (e.g., analytics, performance measure, reports, etc.) that can be leveraged both for application analysis or for internal usage. Since, both architectures will maintain large amounts of data regarding not only pharmaceutical informations but also internal company informations, it would be possible to digest the data by reporting/ analytics tools and also apply data-mining, machine learning technologies to exploit intrinsic informations. The thesis work will introduce, proposals, implementations, descriptions of used technologies/platforms and future works of the above discussed environments.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Universidade Estadual de Campinas . Faculdade de Educação Física

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The productivity associated with commonly available disassembly methods today seldomly makes disassembly the preferred end-of-life solution for massive take back product streams. Systematic reuse of parts or components, or recycling of pure material fractions are often not achievable in an economically sustainable way. In this paper a case-based review of current disassembly practices is used to analyse the factors influencing disassembly feasibility. Data mining techniques were used to identify major factors influencing the profitability of disassembly operations. Case characteristics such as involvement of the product manufacturer in the end-of-life treatment and continuous ownership are some of the important dimensions. Economic models demonstrate that the efficiency of disassembly operations should be increased an order of magnitude to assure the competitiveness of ecologically preferred, disassembly oriented end-of-life scenarios for large waste of electric and electronic equipment (WEEE) streams. Technological means available to increase the productivity of the disassembly operations are summarized. Automated disassembly techniques can contribute to the robustness of the process, but do not allow to overcome the efficiency gap if not combined with appropriate product design measures. Innovative, reversible joints, collectively activated by external trigger signals, form a promising approach to low cost, mass disassembly in this context. A short overview of the state-of-the-art in the development of such self-disassembling joints is included. (c) 2008 CIRP.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Several aspects of photoperception and light signal transduction have been elucidated by studies with model plants. However, the information available for economically important crops, such as Fabaceae species, is scarce. In order to incorporate the existing genomic tools into a strategy to advance soybean research, we have investigated publicly available expressed sequence tag ( EST) sequence databases in order to identify Glycine max sequences related to genes involved in light-regulated developmental control in model plants. Approximately 38,000 sequences from open-access databases were investigated, and all bona fide and putative photoreceptor gene families were found in soybean sequence databases. We have identified G. max orthologs for several families of transcriptional regulators and cytoplasmic proteins mediating photoreceptor-induced responses, although some important Arabidopsis phytochrome-signaling components are absent. Moreover, soybean and Arabidopsis gene-family homologs appear to have undergone a distinct expansion process in some cases. We propose a working model of light perception, signal transduction and response-eliciting in G. max, based on the identified key components from Arabidopsis. These results demonstrate the power of comparative genomics between model systems and crop species to elucidate several aspects of plant physiology and metabolism.