18 resultados para Data Extraction

em CentAUR: Central Archive University of Reading - UK


Relevância:

70.00% 70.00%

Publicador:

Resumo:

Background: Expression microarrays are increasingly used to obtain large scale transcriptomic information on a wide range of biological samples. Nevertheless, there is still much debate on the best ways to process data, to design experiments and analyse the output. Furthermore, many of the more sophisticated mathematical approaches to data analysis in the literature remain inaccessible to much of the biological research community. In this study we examine ways of extracting and analysing a large data set obtained using the Agilent long oligonucleotide transcriptomics platform, applied to a set of human macrophage and dendritic cell samples. Results: We describe and validate a series of data extraction, transformation and normalisation steps which are implemented via a new R function. Analysis of replicate normalised reference data demonstrate that intrarray variability is small (only around 2 of the mean log signal), while interarray variability from replicate array measurements has a standard deviation (SD) of around 0.5 log(2) units (6 of mean). The common practise of working with ratios of Cy5/Cy3 signal offers little further improvement in terms of reducing error. Comparison to expression data obtained using Arabidopsis samples demonstrates that the large number of genes in each sample showing a low level of transcription reflect the real complexity of the cellular transcriptome. Multidimensional scaling is used to show that the processed data identifies an underlying structure which reflect some of the key biological variables which define the data set. This structure is robust, allowing reliable comparison of samples collected over a number of years and collected by a variety of operators. Conclusions: This study outlines a robust and easily implemented pipeline for extracting, transforming normalising and visualising transcriptomic array data from Agilent expression platform. The analysis is used to obtain quantitative estimates of the SD arising from experimental (non biological) intra- and interarray variability, and for a lower threshold for determining whether an individual gene is expressed. The study provides a reliable basis for further more extensive studies of the systems biology of eukaryotic cells.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Insect returns from the UK's Doppler weather radars were collected in the summers of 2007 and 2008, to ascertain their usefulness in providing information about boundary layer winds. Such observations could be assimilated into numerical weather prediction models to improve forecasts of convective showers before precipitation begins. Significant numbers of insect returns were observed during daylight hours on a number of days through this period, when they were detected at up to 30 km range from the radars, and up to 2 km above sea level. The range of detectable insect returns was found to vary with time of year and temperature. There was also a very weak correlation with wind speed and direction. Use of a dual-polarized radar revealed that the insects did not orient themselves at random, but showed distinct evidence of common orientation on several days, sometimes at an angle to their direction of travel. Observation minus model background residuals of wind profiles showed greater bias and standard deviation than that of other wind measurement types, which may be due to the insects' headings/airspeeds and to imperfect data extraction. The method used here, similar to the Met Office's procedure for extracting precipitation returns, requires further development as clutter contamination remained one of the largest error contributors. Wind observations derived from the insect returns would then be useful for data assimilation applications.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Aim: Previous systematic reviews have found that drug-related morbidity accounts for 4.3% of preventable hospital admissions. None, however, has identified the drugs most commonly responsible for preventable hospital admissions. The aims of this study were to estimate the percentage of preventable drug-related hospital admissions, the most common drug causes of preventable hospital admissions and the most common underlying causes of preventable drug-related admissions. Methods: Bibliographic databases and reference lists from eligible articles and study authors were the sources for data. Seventeen prospective observational studies reporting the proportion of preventable drug-related hospital admissions, causative drugs and/or the underlying causes of hospital admissions were selected. Included studies used multiple reviewers and/or explicit criteria to assess causality and preventability of hospital admissions. Two investigators abstracted data from all included studies using a purpose-made data extraction form. Results: From 13 papers the median percentage of preventable drug-related admissions to hospital was 3.7% (range 1.4-15.4). From nine papers the majority (51%) of preventable drug-related admissions involved either antiplatelets (16%), diuretics (16%), nonsteroidal anti-inflammatory drugs (11%) or anticoagulants (8%). From five studies the median proportion of preventable drug-related admissions associated with prescribing problems was 30.6% (range 11.1-41.8), with adherence problems 33.3% (range 20.9-41.7) and with monitoring problems 22.2% (range 0-31.3). Conclusions: Four groups of drugs account for more than 50% of the drug groups associated with preventable drug-related hospital admissions. Concentrating interventions on these drug groups could reduce appreciably the number of preventable drug-related admissions to hospital from primary care.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Objectives: To clarify the role of growth monitoring in primary school children, including obesity, and to examine issues that might impact on the effectiveness and cost-effectiveness of such programmes. Data sources: Electronic databases were searched up to July 2005. Experts in the field were also consulted. Review methods: Data extraction and quality assessment were performed on studies meeting the review's inclusion criteria. The performance of growth monitoring to detect disorders of stature and obesity was evaluated against National Screening Committee (NSC) criteria. Results: In the 31 studies that were included in the review, there were no controlled trials of the impact of growth monitoring and no studies of the diagnostic accuracy of different methods for growth monitoring. Analysis of the studies that presented a 'diagnostic yield' of growth monitoring suggested that one-off screening might identify between 1: 545 and 1: 1793 new cases of potentially treatable conditions. Economic modelling suggested that growth monitoring is associated with health improvements [ incremental cost per quality-adjusted life-year (QALY) of pound 9500] and indicated that monitoring was cost-effective 100% of the time over the given probability distributions for a willingness to pay threshold of pound 30,000 per QALY. Studies of obesity focused on the performance of body mass index against measures of body fat. A number of issues relating to human resources required for growth monitoring were identified, but data on attitudes to growth monitoring were extremely sparse. Preliminary findings from economic modelling suggested that primary prevention may be the most cost-effective approach to obesity management, but the model incorporated a great deal of uncertainty. Conclusions: This review has indicated the potential utility and cost-effectiveness of growth monitoring in terms of increased detection of stature-related disorders. It has also pointed strongly to the need for further research. Growth monitoring does not currently meet all NSC criteria. However, it is questionable whether some of these criteria can be meaningfully applied to growth monitoring given that short stature is not a disease in itself, but is used as a marker for a range of pathologies and as an indicator of general health status. Identification of effective interventions for the treatment of obesity is likely to be considered a prerequisite to any move from monitoring to a screening programme designed to identify individual overweight and obese children. Similarly, further long-term studies of the predictors of obesity-related co-morbidities in adulthood are warranted. A cluster randomised trial comparing growth monitoring strategies with no growth monitoring in the general population would most reliably determine the clinical effectiveness of growth monitoring. Studies of diagnostic accuracy, alongside evidence of effective treatment strategies, could provide an alternative approach. In this context, careful consideration would need to be given to target conditions and intervention thresholds. Diagnostic accuracy studies would require long-term follow-up of both short and normal children to determine sensitivity and specificity of growth monitoring.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The extraction of design data for the lowpass dielectric multilayer according to Tschebysheff performance is described. The extraction proceeds initially by analogy with electric-circuit design, and can then be given numerical refinement which is also described. Agreement with the Tschebysheff desideratum is satisfactory. The multilayers extracted by this procedure are of fractional thickness, symmetric with regard to their central layers.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Tidal channel networks play an important role in the intertidal zone, exerting substantial control over the hydrodynamics and sediment transport of the region and hence over the evolution of the salt marshes and tidal flats. The study of the morphodynamics of tidal channels is currently an active area of research, and a number of theories have been proposed which require for their validation measurement of channels over extensive areas. Remotely sensed data provide a suitable means for such channel mapping. The paper describes a technique that may be adapted to extract tidal channels from either aerial photographs or LiDAR data separately, or from both types of data used together in a fusion approach. Application of the technique to channel extraction from LiDAR data has been described previously. However, aerial photographs of intertidal zones are much more commonly available than LiDAR data, and most LiDAR flights now involve acquisition of multispectral images to complement the LiDAR data. In view of this, the paper investigates the use of multispectral data for semiautomatic identification of tidal channels, firstly from only aerial photographs or linescanner data, and secondly from fused linescanner and LiDAR data sets. A multi-level, knowledge-based approach is employed. The algorithm based on aerial photography can achieve a useful channel extraction, though may fail to detect some of the smaller channels, partly because the spectral response of parts of the non-channel areas may be similar to that of the channels. The algorithm for channel extraction from fused LiDAR and spectral data gives an increased accuracy, though only slightly higher than that obtained using LiDAR data alone. The results illustrate the difficulty of developing a fully automated method, and justify the semi-automatic approach adopted.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Quantitative structure activity relationships (QSARs) have been developed to optimise the choice of nitrogen heterocyclic molecules that can be used to separate the minor actinides such as americium(III) from europium(III) in the aqueous PUREX raffinate of nuclear waste. Experimental data on distribution coefficients and separation factors (SFs) for 47 such ligands have been obtained and show SF values ranging from 0.61 to 100. The ligands were divided into a training set of 36 molecules to develop the QSAR and a test set of 11 molecules to validate the QSAR. Over 1500 molecular descriptors were calculated for each heterocycle and the Genetic Algorithm was used to select the most appropriate for use in multiple regression equations. Equations were developed fitting the separation factors to 6-8 molecular descriptors which gave r(2) values of >0.8 for the training set and values of >0.7 for the test set, thus showing good predictive quality. The descriptors used in the equations were primarily electronic and steric. These equations can be used to predict the separation factors of nitrogen heterocycles not yet synthesised and/or tested and hence obtain the most efficient ligands for lanthanide and actinide separation. (C) 2003 Elsevier B.V. All rights reserved.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Liquid chromatography-mass spectrometry (LC-MS) datasets can be compared or combined following chromatographic alignment. Here we describe a simple solution to the specific problem of aligning one LC-MS dataset and one LC-MS/MS dataset, acquired on separate instruments from an enzymatic digest of a protein mixture, using feature extraction and a genetic algorithm. First, the LC-MS dataset is searched within a few ppm of the calculated theoretical masses of peptides confidently identified by LC-MS/MS. A piecewise linear function is then fitted to these matched peptides using a genetic algorithm with a fitness function that is insensitive to incorrect matches but sufficiently flexible to adapt to the discrete shifts common when comparing LC datasets. We demonstrate the utility of this method by aligning ion trap LC-MS/MS data with accurate LC-MS data from an FTICR mass spectrometer and show how hybrid datasets can improve peptide and protein identification by combining the speed of the ion trap with the mass accuracy of the FTICR, similar to using a hybrid ion trap-FTICR instrument. We also show that the high resolving power of FTICR can improve precision and linear dynamic range in quantitative proteomics. The alignment software, msalign, is freely available as open source.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Automatic indexing and retrieval of digital data poses major challenges. The main problem arises from the ever increasing mass of digital media and the lack of efficient methods for indexing and retrieval of such data based on the semantic content rather than keywords. To enable intelligent web interactions, or even web filtering, we need to be capable of interpreting the information base in an intelligent manner. For a number of years research has been ongoing in the field of ontological engineering with the aim of using ontologies to add such (meta) knowledge to information. In this paper, we describe the architecture of a system (Dynamic REtrieval Analysis and semantic metadata Management (DREAM)) designed to automatically and intelligently index huge repositories of special effects video clips, based on their semantic content, using a network of scalable ontologies to enable intelligent retrieval. The DREAM Demonstrator has been evaluated as deployed in the film post-production phase to support the process of storage, indexing and retrieval of large data sets of special effects video clips as an exemplar application domain. This paper provides its performance and usability results and highlights the scope for future enhancements of the DREAM architecture which has proven successful in its first and possibly most challenging proving ground, namely film production, where it is already in routine use within our test bed Partners' creative processes. (C) 2009 Published by Elsevier B.V.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper introduces a new neurofuzzy model construction and parameter estimation algorithm from observed finite data sets, based on a Takagi and Sugeno (T-S) inference mechanism and a new extended Gram-Schmidt orthogonal decomposition algorithm, for the modeling of a priori unknown dynamical systems in the form of a set of fuzzy rules. The first contribution of the paper is the introduction of a one to one mapping between a fuzzy rule-base and a model matrix feature subspace using the T-S inference mechanism. This link enables the numerical properties associated with a rule-based matrix subspace, the relationships amongst these matrix subspaces, and the correlation between the output vector and a rule-base matrix subspace, to be investigated and extracted as rule-based knowledge to enhance model transparency. The matrix subspace spanned by a fuzzy rule is initially derived as the input regression matrix multiplied by a weighting matrix that consists of the corresponding fuzzy membership functions over the training data set. Model transparency is explored by the derivation of an equivalence between an A-optimality experimental design criterion of the weighting matrix and the average model output sensitivity to the fuzzy rule, so that rule-bases can be effectively measured by their identifiability via the A-optimality experimental design criterion. The A-optimality experimental design criterion of the weighting matrices of fuzzy rules is used to construct an initial model rule-base. An extended Gram-Schmidt algorithm is then developed to estimate the parameter vector for each rule. This new algorithm decomposes the model rule-bases via an orthogonal subspace decomposition approach, so as to enhance model transparency with the capability of interpreting the derived rule-base energy level. This new approach is computationally simpler than the conventional Gram-Schmidt algorithm for resolving high dimensional regression problems, whereby it is computationally desirable to decompose complex models into a few submodels rather than a single model with large number of input variables and the associated curse of dimensionality problem. Numerical examples are included to demonstrate the effectiveness of the proposed new algorithm.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A new robust neurofuzzy model construction algorithm has been introduced for the modeling of a priori unknown dynamical systems from observed finite data sets in the form of a set of fuzzy rules. Based on a Takagi-Sugeno (T-S) inference mechanism a one to one mapping between a fuzzy rule base and a model matrix feature subspace is established. This link enables rule based knowledge to be extracted from matrix subspace to enhance model transparency. In order to achieve maximized model robustness and sparsity, a new robust extended Gram-Schmidt (G-S) method has been introduced via two effective and complementary approaches of regularization and D-optimality experimental design. Model rule bases are decomposed into orthogonal subspaces, so as to enhance model transparency with the capability of interpreting the derived rule base energy level. A locally regularized orthogonal least squares algorithm, combined with a D-optimality used for subspace based rule selection, has been extended for fuzzy rule regularization and subspace based information extraction. By using a weighting for the D-optimality cost function, the entire model construction procedure becomes automatic. Numerical examples are included to demonstrate the effectiveness of the proposed new algorithm.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We are developing computational tools supporting the detailed analysis of the dependence of neural electrophysiological response on dendritic morphology. We approach this problem by combining simulations of faithful models of neurons (experimental real life morphological data with known models of channel kinetics) with algorithmic extraction of morphological and physiological parameters and statistical analysis. In this paper, we present the novel method for an automatic recognition of spike trains in voltage traces, which eliminates the need for human intervention. This enables classification of waveforms with consistent criteria across all the analyzed traces and so it amounts to reduction of the noise in the data. This method allows for an automatic extraction of relevant physiological parameters necessary for further statistical analysis. In order to illustrate the usefulness of this procedure to analyze voltage traces, we characterized the influence of the somatic current injection level on several electrophysiological parameters in a set of modeled neurons. This application suggests that such an algorithmic processing of physiological data extracts parameters in a suitable form for further investigation of structure-activity relationship in single neurons.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The organization of non-crystalline polymeric materials at a local level, namely on a spatial scale between a few and 100 a, is still unclear in many respects. The determination of the local structure in terms of the configuration and conformation of the polymer chain and of the packing characteristics of the chain in the bulk material represents a challenging problem. Data from wide-angle diffraction experiments are very difficult to interpret due to the very large amount of information that they carry, that is the large number of correlations present in the diffraction patterns.We describe new approaches that permit a detailed analysis of the complex neutron diffraction patterns characterizing polymer melts and glasses. The coupling of different computer modelling strategies with neutron scattering data over a wide Q range allows the extraction of detailed quantitative information on the structural arrangements of the materials of interest. Proceeding from modelling routes as diverse as force field calculations, single-chain modelling and reverse Monte Carlo, we show the successes and pitfalls of each approach in describing model systems, which illustrate the need to attack the data analysis problem simultaneously from several fronts.