893 resultados para data analysis: algorithms and implementation


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Background: Biologists often need to assess whether unfamiliar datasets warrant the time investment required for more detailed exploration. Basing such assessments on brief descriptions provided by data publishers is unwieldy for large datasets that contain insights dependent on specific scientific questions. Alternatively, using complex software systems for a preliminary analysis may be deemed as too time consuming in itself, especially for unfamiliar data types and formats. This may lead to wasted analysis time and discarding of potentially useful data. Results: We present an exploration of design opportunities that the Google Maps interface offers to biomedical data visualization. In particular, we focus on synergies between visualization techniques and Google Maps that facilitate the development of biological visualizations which have both low-overhead and sufficient expressivity to support the exploration of data at multiple scales. The methods we explore rely on displaying pre-rendered visualizations of biological data in browsers, with sparse yet powerful interactions, by using the Google Maps API. We structure our discussion around five visualizations: a gene co-regulation visualization, a heatmap viewer, a genome browser, a protein interaction network, and a planar visualization of white matter in the brain. Feedback from collaborative work with domain experts suggests that our Google Maps visualizations offer multiple, scale-dependent perspectives and can be particularly helpful for unfamiliar datasets due to their accessibility. We also find that users, particularly those less experienced with computer use, are attracted by the familiarity of the Google Maps API. Our five implementations introduce design elements that can benefit visualization developers. Conclusions: We describe a low-overhead approach that lets biologists access readily analyzed views of unfamiliar scientific datasets. We rely on pre-computed visualizations prepared by data experts, accompanied by sparse and intuitive interactions, and distributed via the familiar Google Maps framework. Our contributions are an evaluation demonstrating the validity and opportunities of this approach, a set of design guidelines benefiting those wanting to create such visualizations, and five concrete example visualizations.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Personalized recommender systems aim to assist users in retrieving and accessing interesting items by automatically acquiring user preferences from the historical data and matching items with the preferences. In the last decade, recommendation services have gained great attention due to the problem of information overload. However, despite recent advances of personalization techniques, several critical issues in modern recommender systems have not been well studied. These issues include: (1) understanding the accessing patterns of users (i.e., how to effectively model users' accessing behaviors); (2) understanding the relations between users and other objects (i.e., how to comprehensively assess the complex correlations between users and entities in recommender systems); and (3) understanding the interest change of users (i.e., how to adaptively capture users' preference drift over time). To meet the needs of users in modern recommender systems, it is imperative to provide solutions to address the aforementioned issues and apply the solutions to real-world applications. ^ The major goal of this dissertation is to provide integrated recommendation approaches to tackle the challenges of the current generation of recommender systems. In particular, three user-oriented aspects of recommendation techniques were studied, including understanding accessing patterns, understanding complex relations and understanding temporal dynamics. To this end, we made three research contributions. First, we presented various personalized user profiling algorithms to capture click behaviors of users from both coarse- and fine-grained granularities; second, we proposed graph-based recommendation models to describe the complex correlations in a recommender system; third, we studied temporal recommendation approaches in order to capture the preference changes of users, by considering both long-term and short-term user profiles. In addition, a versatile recommendation framework was proposed, in which the proposed recommendation techniques were seamlessly integrated. Different evaluation criteria were implemented in this framework for evaluating recommendation techniques in real-world recommendation applications. ^ In summary, the frequent changes of user interests and item repository lead to a series of user-centric challenges that are not well addressed in the current generation of recommender systems. My work proposed reasonable solutions to these challenges and provided insights on how to address these challenges using a simple yet effective recommendation framework.^

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Cloud computing enables independent end users and applications to share data and pooled resources, possibly located in geographically distributed Data Centers, in a fully transparent way. This need is particularly felt by scientific applications to exploit distributed resources in efficient and scalable way for the processing of big amount of data. This paper proposes an open so- lution to deploy a Platform as a service (PaaS) over a set of multi- site data centers by applying open source virtualization tools to facilitate operation among virtual machines while optimizing the usage of distributed resources. An experimental testbed is set up in Openstack environment to obtain evaluations with different types of TCP sample connections to demonstrate the functionality of the proposed solution and to obtain throughput measurements in relation to relevant design parameters.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Thermodynamic stability measurements on proteins and protein-ligand complexes can offer insights not only into the fundamental properties of protein folding reactions and protein functions, but also into the development of protein-directed therapeutic agents to combat disease. Conventional calorimetric or spectroscopic approaches for measuring protein stability typically require large amounts of purified protein. This requirement has precluded their use in proteomic applications. Stability of Proteins from Rates of Oxidation (SPROX) is a recently developed mass spectrometry-based approach for proteome-wide thermodynamic stability analysis. Since the proteomic coverage of SPROX is fundamentally limited by the detection of methionine-containing peptides, the use of tryptophan-containing peptides was investigated in this dissertation. A new SPROX-like protocol was developed that measured protein folding free energies using the denaturant dependence of the rate at which globally protected tryptophan and methionine residues are modified with dimethyl (2-hydroxyl-5-nitrobenzyl) sulfonium bromide and hydrogen peroxide, respectively. This so-called Hybrid protocol was applied to proteins in yeast and MCF-7 cell lysates and achieved a ~50% increase in proteomic coverage compared to probing only methionine-containing peptides. Subsequently, the Hybrid protocol was successfully utilized to identify and quantify both known and novel protein-ligand interactions in cell lysates. The ligands under study included the well-known Hsp90 inhibitor geldanamycin and the less well-understood omeprazole sulfide that inhibits liver-stage malaria. In addition to protein-small molecule interactions, protein-protein interactions involving Puf6 were investigated using the SPROX technique in comparative thermodynamic analyses performed on wild-type and Puf6-deletion yeast strains. A total of 39 proteins were detected as Puf6 targets and 36 of these targets were previously unknown to interact with Puf6. Finally, to facilitate the SPROX/Hybrid data analysis process and minimize human errors, a Bayesian algorithm was developed for transition midpoint assignment. In summary, the work in this dissertation expanded the scope of SPROX and evaluated the use of SPROX/Hybrid protocols for characterizing protein-ligand interactions in complex biological mixtures.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A substantial amount of information on the Internet is present in the form of text. The value of this semi-structured and unstructured data has been widely acknowledged, with consequent scientific and commercial exploitation. The ever-increasing data production, however, pushes data analytic platforms to their limit. This thesis proposes techniques for more efficient textual big data analysis suitable for the Hadoop analytic platform. This research explores the direct processing of compressed textual data. The focus is on developing novel compression methods with a number of desirable properties to support text-based big data analysis in distributed environments. The novel contributions of this work include the following. Firstly, a Content-aware Partial Compression (CaPC) scheme is developed. CaPC makes a distinction between informational and functional content in which only the informational content is compressed. Thus, the compressed data is made transparent to existing software libraries which often rely on functional content to work. Secondly, a context-free bit-oriented compression scheme (Approximated Huffman Compression) based on the Huffman algorithm is developed. This uses a hybrid data structure that allows pattern searching in compressed data in linear time. Thirdly, several modern compression schemes have been extended so that the compressed data can be safely split with respect to logical data records in distributed file systems. Furthermore, an innovative two layer compression architecture is used, in which each compression layer is appropriate for the corresponding stage of data processing. Peripheral libraries are developed that seamlessly link the proposed compression schemes to existing analytic platforms and computational frameworks, and also make the use of the compressed data transparent to developers. The compression schemes have been evaluated for a number of standard MapReduce analysis tasks using a collection of real-world datasets. In comparison with existing solutions, they have shown substantial improvement in performance and significant reduction in system resource requirements.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Improving lava flow hazard assessment is one of the most important and challenging fields of volcanology, and has an immediate and practical impact on society. Here, we present a methodology for the quantitative assessment of lava flow hazards based on a combination of field data, numerical simulations and probability analyses. With the extensive data available on historic eruptions of Mt. Etna, going back over 2000 years, it has been possible to construct two hazard maps, one for flank and the other for summit eruptions, allowing a quantitative analysis of the most likely future courses of lava flows. The effective use of hazard maps of Etna may help in minimizing the damage from volcanic eruptions through correct land use in densely urbanized area with a population of almost one million people. Although this study was conducted on Mt. Etna, the approach used is designed to be applicable to other volcanic areas.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this thesis, novel analog-to-digital and digital-to-analog generalized time-interleaved variable bandpass sigma-delta modulators are designed, analysed, evaluated and implemented that are suitable for high performance data conversion for a broad-spectrum of applications. These generalized time-interleaved variable bandpass sigma-delta modulators can perform noise-shaping for any centre frequency from DC to Nyquist. The proposed topologies are well-suited for Butterworth, Chebyshev, inverse-Chebyshev and elliptical filters, where designers have the flexibility of specifying the centre frequency, bandwidth as well as the passband and stopband attenuation parameters. The application of the time-interleaving approach, in combination with these bandpass loop-filters, not only overcomes the limitations that are associated with conventional and mid-band resonator-based bandpass sigma-delta modulators, but also offers an elegant means to increase the conversion bandwidth, thereby relaxing the need to use faster or higher-order sigma-delta modulators. A step-by-step design technique has been developed for the design of time-interleaved variable bandpass sigma-delta modulators. Using this technique, an assortment of lower- and higher-order single- and multi-path generalized A/D variable bandpass sigma-delta modulators were designed, evaluated and compared in terms of their signal-to-noise ratios, hardware complexity, stability, tonality and sensitivity for ideal and non-ideal topologies. Extensive behavioural-level simulations verified that one of the proposed topologies not only used fewer coefficients but also exhibited greater robustness to non-idealties. Furthermore, second-, fourth- and sixth-order single- and multi-path digital variable bandpass digital sigma-delta modulators are designed using this technique. The mathematical modelling and evaluation of tones caused by the finite wordlengths of these digital multi-path sigmadelta modulators, when excited by sinusoidal input signals, are also derived from first principles and verified using simulation and experimental results. The fourth-order digital variable-band sigma-delta modulator topologies are implemented in VHDL and synthesized on Xilinx® SpartanTM-3 Development Kit using fixed-point arithmetic. Circuit outputs were taken via RS232 connection provided on the FPGA board and evaluated using MATLAB routines developed by the author. These routines included the decimation process as well. The experiments undertaken by the author further validated the design methodology presented in the work. In addition, a novel tunable and reconfigurable second-order variable bandpass sigma-delta modulator has been designed and evaluated at the behavioural-level. This topology offers a flexible set of choices for designers and can operate either in single- or dual-mode enabling multi-band implementations on a single digital variable bandpass sigma-delta modulator. This work is also supported by a novel user-friendly design and evaluation tool that has been developed in MATLAB/Simulink that can speed-up the design, evaluation and comparison of analog and digital single-stage and time-interleaved variable bandpass sigma-delta modulators. This tool enables the user to specify the conversion type, topology, loop-filter type, path number and oversampling ratio.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Using the NEODAAS-Dundee AVHRR receiving station (Scotland), NEODAAS-Plymouth can provide calibrated brightness temperature data to end users or interim users in near-real time. Between 2000 and 2009 these data were used to undertake volcano hot spot detection, reporting and time-average discharge rate dissemination during effusive crises at Mount Etna and Stromboli (Italy). Data were passed via FTP, within an hour of image generation, to the hot spot detection system maintained at Hawaii Institute of Geophysics and Planetology (HIGP, University of Hawaii at Manoa, Honolulu, USA). Final product generation and quality control were completed manually at HIGP once a day, so as to provide information to onsite monitoring agencies for their incorporation into daily reporting duties to Italian Civil Protection. We here describe the processing and dissemination chain, which was designed so as to provide timely, useable, quality-controlled and relevant information for ‘one voice’ reporting by the responsible monitoring agencies.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Using the NEODAAS-Dundee AVHRR receiving station (Scotland), NEODAAS-Plymouth can provide calibrated brightness temperature data to end users or interim users in near-real time. Between 2000 and 2009 these data were used to undertake volcano hot spot detection, reporting and time-average discharge rate dissemination during effusive crises at Mount Etna and Stromboli (Italy). Data were passed via FTP, within an hour of image generation, to the hot spot detection system maintained at Hawaii Institute of Geophysics and Planetology (HIGP, University of Hawaii at Manoa, Honolulu, USA). Final product generation and quality control were completed manually at HIGP once a day, so as to provide information to onsite monitoring agencies for their incorporation into daily reporting duties to Italian Civil Protection. We here describe the processing and dissemination chain, which was designed so as to provide timely, useable, quality-controlled and relevant information for ‘one voice’ reporting by the responsible monitoring agencies.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper is part of a special issue of Applied Geochemistry focusing on reliable applications of compositional multivariate statistical methods. This study outlines the application of compositional data analysis (CoDa) to calibration of geochemical data and multivariate statistical modelling of geochemistry and grain-size data from a set of Holocene sedimentary cores from the Ganges-Brahmaputra (G-B) delta. Over the last two decades, understanding near-continuous records of sedimentary sequences has required the use of core-scanning X-ray fluorescence (XRF) spectrometry, for both terrestrial and marine sedimentary sequences. Initial XRF data are generally unusable in ‘raw-format’, requiring data processing in order to remove instrument bias, as well as informed sequence interpretation. The applicability of these conventional calibration equations to core-scanning XRF data are further limited by the constraints posed by unknown measurement geometry and specimen homogeneity, as well as matrix effects. Log-ratio based calibration schemes have been developed and applied to clastic sedimentary sequences focusing mainly on energy dispersive-XRF (ED-XRF) core-scanning. This study has applied high resolution core-scanning XRF to Holocene sedimentary sequences from the tidal-dominated Indian Sundarbans, (Ganges-Brahmaputra delta plain). The Log-Ratio Calibration Equation (LRCE) was applied to a sub-set of core-scan and conventional ED-XRF data to quantify elemental composition. This provides a robust calibration scheme using reduced major axis regression of log-ratio transformed geochemical data. Through partial least squares (PLS) modelling of geochemical and grain-size data, it is possible to derive robust proxy information for the Sundarbans depositional environment. The application of these techniques to Holocene sedimentary data offers an improved methodological framework for unravelling Holocene sedimentation patterns.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Market research is often conducted through conventional methods such as surveys, focus groups and interviews. But the drawbacks of these methods are that they can be costly and timeconsuming. This study develops a new method, based on a combination of standard techniques like sentiment analysis and normalisation, to conduct market research in a manner that is free and quick. The method can be used in many application-areas, but this study focuses mainly on the veganism market to identify vegan food preferences in the form of a profile. Several food words are identified, along with their distribution between positive and negative sentiments in the profile. Surprisingly, non-vegan foods such as cheese, cake, milk, pizza and chicken dominate the profile, indicating that there is a significant market for vegan-suitable alternatives for such foods. Meanwhile, vegan-suitable foods such as coconut, potato, blueberries, kale and tofu also make strong appearances in the profile. Validation is performed by using the method on Volkswagen vehicle data to identify positive and negative sentiment across five car models. Some results were found to be consistent with sales figures and expert reviews, while others were inconsistent. The reliability of the method is therefore questionable, so the results should be used with caution.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

1. Genomewide association studies (GWAS) enable detailed dissections of the genetic basis for organisms' ability to adapt to a changing environment. In long-term studies of natural populations, individuals are often marked at one point in their life and then repeatedly recaptured. It is therefore essential that a method for GWAS includes the process of repeated sampling. In a GWAS, the effects of thousands of single-nucleotide polymorphisms (SNPs) need to be fitted and any model development is constrained by the computational requirements. A method is therefore required that can fit a highly hierarchical model and at the same time is computationally fast enough to be useful. 2. Our method fits fixed SNP effects in a linear mixed model that can include both random polygenic effects and permanent environmental effects. In this way, the model can correct for population structure and model repeated measures. The covariance structure of the linear mixed model is first estimated and subsequently used in a generalized least squares setting to fit the SNP effects. The method was evaluated in a simulation study based on observed genotypes from a long-term study of collared flycatchers in Sweden. 3. The method we present here was successful in estimating permanent environmental effects from simulated repeated measures data. Additionally, we found that especially for variable phenotypes having large variation between years, the repeated measurements model has a substantial increase in power compared to a model using average phenotypes as a response. 4. The method is available in the R package RepeatABEL. It increases the power in GWAS having repeated measures, especially for long-term studies of natural populations, and the R implementation is expected to facilitate modelling of longitudinal data for studies of both animal and human populations.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The object of this report is to present the data and conclusions drawn from the analysis of the origin and destination information. Comments on the advisability and correctness of the approach used by Iowa are encouraged.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Thesis (Ph.D.)--University of Washington, 2016-08