42 resultados para Genomic data integration


Relevância:

40.00% 40.00%

Publicador:

Resumo:

This paper describes the implementation of a 3D variational (3D-Var) data assimilation scheme for a morphodynamic model applied to Morecambe Bay, UK. A simple decoupled hydrodynamic and sediment transport model is combined with a data assimilation scheme to investigate the ability of such methods to improve the accuracy of the predicted bathymetry. The inverse forecast error covariance matrix is modelled using a Laplacian approximation which is calibrated for the length scale parameter required. Calibration is also performed for the Soulsby-van Rijn sediment transport equations. The data used for assimilation purposes comprises waterlines derived from SAR imagery covering the entire period of the model run, and swath bathymetry data collected by a ship-borne survey for one date towards the end of the model run. A LiDAR survey of the entire bay carried out in November 2005 is used for validation purposes. The comparison of the predictive ability of the model alone with the model-forecast-assimilation system demonstrates that using data assimilation significantly improves the forecast skill. An investigation of the assimilation of the swath bathymetry as well as the waterlines demonstrates that the overall improvement is initially large, but decreases over time as the bathymetry evolves away from that observed by the survey. The result of combining the calibration runs into a pseudo-ensemble provides a higher skill score than for a single optimized model run. A brief comparison of the Optimal Interpolation assimilation method with the 3D-Var method shows that the two schemes give similar results.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Brain activity can be measured with several non-invasive neuroimaging modalities, but each modality has inherent limitations with respect to resolution, contrast and interpretability. It is hoped that multimodal integration will address these limitations by using the complementary features of already available data. However, purely statistical integration can prove problematic owing to the disparate signal sources. As an alternative, we propose here an advanced neural population model implemented on an anatomically sound cortical mesh with freely adjustable connectivity, which features proper signal expression through a realistic head model for the electroencephalogram (EEG), as well as a haemodynamic model for functional magnetic resonance imaging based on blood oxygen level dependent contrast (fMRI BOLD). It hence allows simultaneous and realistic predictions of EEG and fMRI BOLD from the same underlying model of neural activity. As proof of principle, we investigate here the influence on simulated brain activity of strengthening visual connectivity. In the future we plan to fit multimodal data with this neural population model. This promises novel, model-based insights into the brain's activity in sleep, rest and task conditions.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Background: Affymetrix GeneChip arrays are widely used for transcriptomic studies in a diverse range of species. Each gene is represented on a GeneChip array by a probe- set, consisting of up to 16 probe-pairs. Signal intensities across probe- pairs within a probe-set vary in part due to different physical hybridisation characteristics of individual probes with their target labelled transcripts. We have previously developed a technique to study the transcriptomes of heterologous species based on hybridising genomic DNA (gDNA) to a GeneChip array designed for a different species, and subsequently using only those probes with good homology. Results: Here we have investigated the effects of hybridising homologous species gDNA to study the transcriptomes of species for which the arrays have been designed. Genomic DNA from Arabidopsis thaliana and rice (Oryza sativa) were hybridised to the Affymetrix Arabidopsis ATH1 and Rice Genome GeneChip arrays respectively. Probe selection based on gDNA hybridisation intensity increased the number of genes identified as significantly differentially expressed in two published studies of Arabidopsis development, and optimised the analysis of technical replicates obtained from pooled samples of RNA from rice. Conclusion: This mixed physical and bioinformatics approach can be used to optimise estimates of gene expression when using GeneChip arrays.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The past years have shown an enormous advancement in sequencing and array-based technologies, producing supplementary or alternative views of the genome stored in various formats and databases. Their sheer volume and different data scope pose a challenge to jointly visualize and integrate diverse data types. We present AmalgamScope a new interactive software tool focusing on assisting scientists with the annotation of the human genome and particularly the integration of the annotation files from multiple data types, using gene identifiers and genomic coordinates. Supported platforms include next-generation sequencing and microarray technologies. The available features of AmalgamScope range from the annotation of diverse data types across the human genome to integration of the data based on the annotational information and visualization of the merged files within chromosomal regions or the whole genome. Additionally, users can define custom transcriptome library files for any species and use the file exchanging distant server options of the tool.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The aim of this paper is essentially twofold: first, to describe the use of spherical nonparametric estimators for determining statistical diagnostic fields from ensembles of feature tracks on a global domain, and second, to report the application of these techniques to data derived from a modern general circulation model. New spherical kernel functions are introduced that are more efficiently computed than the traditional exponential kernels. The data-driven techniques of cross-validation to determine the amount elf smoothing objectively, and adaptive smoothing to vary the smoothing locally, are also considered. Also introduced are techniques for combining seasonal statistical distributions to produce longer-term statistical distributions. Although all calculations are performed globally, only the results for the Northern Hemisphere winter (December, January, February) and Southern Hemisphere winter (June, July, August) cyclonic activity are presented, discussed, and compared with previous studies. Overall, results for the two hemispheric winters are in good agreement with previous studies, both for model-based studies and observational studies.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Population subdivision complicates analysis of molecular variation. Even if neutrality is assumed, three evolutionary forces need to be considered: migration, mutation, and drift. Simplification can be achieved by assuming that the process of migration among and drift within subpopulations is occurring fast compared to Mutation and drift in the entire population. This allows a two-step approach in the analysis: (i) analysis of population subdivision and (ii) analysis of molecular variation in the migrant pool. We model population subdivision using an infinite island model, where we allow the migration/drift parameter Theta to vary among populations. Thus, central and peripheral populations can be differentiated. For inference of Theta, we use a coalescence approach, implemented via a Markov chain Monte Carlo (MCMC) integration method that allows estimation of allele frequencies in the migrant pool. The second step of this approach (analysis of molecular variation in the migrant pool) uses the estimated allele frequencies in the migrant pool for the study of molecular variation. We apply this method to a Drosophila ananassae sequence data set. We find little indication of isolation by distance, but large differences in the migration parameter among populations. The population as a whole seems to be expanding. A population from Bogor (Java, Indonesia) shows the highest variation and seems closest to the species center.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Exact error estimates for evaluating multi-dimensional integrals are considered. An estimate is called exact if the rates of convergence for the low- and upper-bound estimate coincide. The algorithm with such an exact rate is called optimal. Such an algorithm has an unimprovable rate of convergence. The problem of existing exact estimates and optimal algorithms is discussed for some functional spaces that define the regularity of the integrand. Important for practical computations data classes are considered: classes of functions with bounded derivatives and Holder type conditions. The aim of the paper is to analyze the performance of two optimal classes of algorithms: deterministic and randomized for computing multidimensional integrals. It is also shown how the smoothness of the integrand can be exploited to construct better randomized algorithms.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper describes a prototype grid infrastructure, called the eMinerals minigrid, for molecular simulation scientists. which is based on an integration of shared compute and data resources. We describe the key components, namely the use of Condor pools, Linux/Unix clusters with PBS and IBM's LoadLeveller job handling tools, the use of Globus for security handling, the use of Condor-G tools for wrapping globus job submit commands, Condor's DAGman tool for handling workflow, the Storage Resource Broker for handling data, and the CCLRC dataportal and associated tools for both archiving data with metadata and making data available to other workers.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

As integrated software solutions reshape project delivery, they alter the bases for collaboration and competition across firms in complex industries. This paper synthesises and extends literatures on strategy in project-based industries and digitally-integrated work to understand how project-based firms interact with digital infrastructures for project delivery. Four identified strategies are to: 1) develop and use capabilities to shape the integrated software solutions that are used in projects; 2) co-specialize, developing complementary assets to work repeatedly with a particular integrator firm; 3) retain flexibility by developing and maintaining capabilities in multiple digital technologies and processes; and 4) manage interfaces, translating work into project formats for coordination while hiding proprietary data and capabilities in internal systems. The paper articulates the strategic importance of digital infrastructures for delivery as well as product architectures. It concludes by discussing managerial implications of the identified strategies and areas for further research.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper presents a new approach to modelling flash floods in dryland catchments by integrating remote sensing and digital elevation model (DEM) data in a geographical information system (GIS). The spectral reflectance of channels affected by recent flash floods exhibit a marked increase, due to the deposition of fine sediments in these channels as the flood recedes. This allows the parts of a catchment that have been affected by a recent flood event to be discriminated from unaffected parts, using a time series of Landsat images. Using images of the Wadi Hudain catchment in southern Egypt, the hillslope areas contributing flow were inferred for different flood events. The SRTM3 DEM was used to derive flow direction, flow length, active channel cross-sectional areas and slope. The Manning Equation was used to estimate the channel flow velocities, and hence the time-area zones of the catchment. A channel reach that was active during a 1985 runoff event, that does not receive any tributary flow, was used to estimate a transmission loss rate of 7·5 mm h−1, given the maximum peak discharge estimate. Runoff patterns resulting from different flood events are quite variable; however the southern part of the catchment appears to have experienced more floods during the period of study (1984–2000), perhaps because the bedrock hillslopes in this area are more effective at runoff production than other parts of the catchment which are underlain by unconsolidated Quaternary sands and gravels. Due to high transmission loss, runoff generated within the upper reaches is rarely delivered to the alluvial fan and Shalateen city situated at the catchment outlet. The synthetic GIS-based time area zones, on their own, cannot be relied on to model the hydrographs reliably; physical parameters, such as rainfall intensity, distribution, and transmission loss, must also be considered.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This dissertation deals with aspects of sequential data assimilation (in particular ensemble Kalman filtering) and numerical weather forecasting. In the first part, the recently formulated Ensemble Kalman-Bucy (EnKBF) filter is revisited. It is shown that the previously used numerical integration scheme fails when the magnitude of the background error covariance grows beyond that of the observational error covariance in the forecast window. Therefore, we present a suitable integration scheme that handles the stiffening of the differential equations involved and doesn’t represent further computational expense. Moreover, a transform-based alternative to the EnKBF is developed: under this scheme, the operations are performed in the ensemble space instead of in the state space. Advantages of this formulation are explained. For the first time, the EnKBF is implemented in an atmospheric model. The second part of this work deals with ensemble clustering, a phenomenon that arises when performing data assimilation using of deterministic ensemble square root filters in highly nonlinear forecast models. Namely, an M-member ensemble detaches into an outlier and a cluster of M-1 members. Previous works may suggest that this issue represents a failure of EnSRFs; this work dispels that notion. It is shown that ensemble clustering can be reverted also due to nonlinear processes, in particular the alternation between nonlinear expansion and compression of the ensemble for different regions of the attractor. Some EnSRFs that use random rotations have been developed to overcome this issue; these formulations are analyzed and their advantages and disadvantages with respect to common EnSRFs are discussed. The third and last part contains the implementation of the Robert-Asselin-Williams (RAW) filter in an atmospheric model. The RAW filter is an improvement to the widely popular Robert-Asselin filter that successfully suppresses spurious computational waves while avoiding any distortion in the mean value of the function. Using statistical significance tests both at the local and field level, it is shown that the climatology of the SPEEDY model is not modified by the changed time stepping scheme; hence, no retuning of the parameterizations is required. It is found the accuracy of the medium-term forecasts is increased by using the RAW filter.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The Escherichia coli O26 serogroup includes important food-borne pathogens associated with human and animal diarrheal disease. Current typing methods have revealed great genetic heterogeneity within the O26 group; the data are often inconsistent and focus only on verotoxin (VT)-positive O26 isolates. To improve current understanding of diversity within this serogroup, the genomic relatedness of VT-positive and -negative O26 strains was assessed by comparative genomic indexing. Our results clearly demonstrate that irrespective of virulence characteristics and pathotype designation, the O26 strains show greater genomic similarity to each other than to any other strain included in this study. Our data suggest that enteropathogenic and VT-expressing E. coli O26 strains represent the same clonal lineage and that W-expressing E. coli O26 strains have gained additional virulence characteristics. Using this approach, we established the core genes which are central to the E. coli species and identified regions of variation from the E. coli K-12 chromosomal backbone.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

With the increasing awareness of protein folding disorders, the explosion of genomic information, and the need for efficient ways to predict protein structure, protein folding and unfolding has become a central issue in molecular sciences research. Molecular dynamics computer simulations are increasingly employed to understand the folding and unfolding of proteins. Running protein unfolding simulations is computationally expensive and finding ways to enhance performance is a grid issue on its own. However, more and more groups run such simulations and generate a myriad of data, which raises new challenges in managing and analyzing these data. Because the vast range of proteins researchers want to study and simulate, the computational effort needed to generate data, the large data volumes involved, and the different types of analyses scientists need to perform, it is desirable to provide a public repository allowing researchers to pool and share protein unfolding data. This paper describes efforts to provide a grid-enabled data warehouse for protein unfolding data. We outline the challenge and present first results in the design and implementation of the data warehouse.