926 resultados para Genomic data integration


Relevância:

30.00% 30.00%

Publicador:

Resumo:

Current methods for detection of copy number variants (CNV) and aberrations (CNA) from targeted sequencing data are based on the depth of coverage of captured exons. Accurate CNA determination is complicated by uneven genomic distribution and non-uniform capture efficiency of targeted exons. Here we present CopywriteR, which eludes these problems by exploiting 'off-target' sequence reads. CopywriteR allows for extracting uniformly distributed copy number information, can be used without reference, and can be applied to sequencing data obtained from various techniques including chromatin immunoprecipitation and target enrichment on small gene panels. CopywriteR outperforms existing methods and constitutes a widely applicable alternative to available tools.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

PURPOSE The implementation of genomic-based medicine is hindered by unresolved questions regarding data privacy and delivery of interpreted results to health-care practitioners. We used DNA-based prediction of HIV-related outcomes as a model to explore critical issues in clinical genomics. METHODS We genotyped 4,149 markers in HIV-positive individuals. Variants allowed for prediction of 17 traits relevant to HIV medical care, inference of patient ancestry, and imputation of human leukocyte antigen (HLA) types. Genetic data were processed under a privacy-preserving framework using homomorphic encryption, and clinical reports describing potentially actionable results were delivered to health-care providers. RESULTS A total of 230 patients were included in the study. We demonstrated the feasibility of encrypting a large number of genetic markers, inferring patient ancestry, computing monogenic and polygenic trait risks, and reporting results under privacy-preserving conditions. The average execution time of a multimarker test on encrypted data was 865 ms on a standard computer. The proportion of tests returning potentially actionable genetic results ranged from 0 to 54%. CONCLUSIONS The model of implementation presented herein informs on strategies to deliver genomic test results for clinical care. Data encryption to ensure privacy helps to build patient trust, a key requirement on the road to genomic-based medicine.Genet Med advance online publication 14 January 2016Genetics in Medicine (2016); doi:10.1038/gim.2015.167.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The Indian textiles industry is now at the crossroads with the phasing out of quota regime that prevailed under the Multi-Fiber Agreement (MFA) until the end of 2004. In the face of a full integration of the textiles sector in the WTO, maintaining and enhancing productive efficiency is a precondition for competitiveness of the Indian firms in the new liberalized world market. In this paper we use data obtained from the Annual Survey of Industries for a number of years to measure the levels of technical efficiency in the Indian textiles industry at the firm level. We use both a grand frontier applicable to all firms and a group frontier specific to firms from any individual state, ownership, or organization type in order to evaluate their efficiencies. This permits us to separately identify how locational, proprietary, and organizational characteristics of a firm affect its performance.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Cancer cell lines can be treated with a drug and the molecular comparison of responders and non-responders may yield potential predictors that could be tested in the clinic. It is a bioinformatics challenge to apply the cell line-derived multivariable response predictors to patients who respond to therapy. Using the gene expression data from 23 breast cancer cell lines, I developed three predictors of dasatinib sensitivity by selecting differentially expressed genes and applying different classification algorithms. The performance of these predictors on independent cell lines with known dasatinib response was tested. The predictor based on weighted voting method has the best overall performance. It correctly predicted dasatinib sensitivity in 11 out of 12 (92%) breast and 17 out of 23 (74%) lung cancer cell lines. These predictors were then applied to the gene expression data from 133 breast cancer patients in an attempt to predict how the patients might respond to dasatinib therapy. Two predictors identified 13 patients in common to be dasatinib sensitive. Sixty two percent of these cases are triple negative (ER-negative, HER2-negative and PR-negative) and 76% are double negative. The result is consistent with the findings from other studies, which identified a target population for dasatinib treatment to be triple negative or basal breast cancer subtype. In conclusion, we think that the cell line-derived dasatinib classifiers can be applied to the human patients. ^

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Next-generation DNA sequencing platforms can effectively detect the entire spectrum of genomic variation and is emerging to be a major tool for systematic exploration of the universe of variants and interactions in the entire genome. However, the data produced by next-generation sequencing technologies will suffer from three basic problems: sequence errors, assembly errors, and missing data. Current statistical methods for genetic analysis are well suited for detecting the association of common variants, but are less suitable to rare variants. This raises great challenge for sequence-based genetic studies of complex diseases.^ This research dissertation utilized genome continuum model as a general principle, and stochastic calculus and functional data analysis as tools for developing novel and powerful statistical methods for next generation of association studies of both qualitative and quantitative traits in the context of sequencing data, which finally lead to shifting the paradigm of association analysis from the current locus-by-locus analysis to collectively analyzing genome regions.^ In this project, the functional principal component (FPC) methods coupled with high-dimensional data reduction techniques will be used to develop novel and powerful methods for testing the associations of the entire spectrum of genetic variation within a segment of genome or a gene regardless of whether the variants are common or rare.^ The classical quantitative genetics suffer from high type I error rates and low power for rare variants. To overcome these limitations for resequencing data, this project used functional linear models with scalar response to develop statistics for identifying quantitative trait loci (QTLs) for both common and rare variants. To illustrate their applications, the functional linear models were applied to five quantitative traits in Framingham heart studies. ^ This project proposed a novel concept of gene-gene co-association in which a gene or a genomic region is taken as a unit of association analysis and used stochastic calculus to develop a unified framework for testing the association of multiple genes or genomic regions for both common and rare alleles. The proposed methods were applied to gene-gene co-association analysis of psoriasis in two independent GWAS datasets which led to discovery of networks significantly associated with psoriasis.^

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Making healthcare comprehensive and more efficient remains a complex challenge. Health Information Technology (HIT) is recognized as an important component of this transformation but few studies describe HIT adoption and it's effect on the bedside experience by physicians, staff and patients. This study applied descriptive statistics and correlation analysis to data from the Patient-Centered Medical Home National Demonstration Project (NDP) of the American Academy of Family Physicians. Thirty-six clinics were followed for 26 months by clinician/staff questionnaires and patient surveys. This study characterizes those clinics as well as staff and patient perspectives on HIT usefulness, the doctor-patient relationship, electronic medical record (EMR) implementation, and computer connections in the practice throughout the study. The Global Practice Experience factor, a composite score related to key components of primary care, was then correlated to clinician and patient perspectives. This study found wide adoption of HIT among NDP practices. Patient perspectives on HIT helpfulness on the doctor-patient showed a suggestive trend that approached statistical significance (p = 0.172). Clinicians and staff noted successful integration of EMR into clinic workflow and their perception of helpfulness to the doctor-patient relationship show a suggestive increase also approaching statistical significance (p=0.06). GPE was correlated with clinician/staff assessment of a helpful doctor-patient relationship midway through the study (R 0.460, p = 0.021) with the remaining time points nearing statistical significance. GPE was also correlated to both patient perspectives of EMR helpfulness in the doctor-patient relationship (R 0.601, p = 0.001) and computer connections (R 0.618, p = 0.0001) at the start of the study. ^

Relevância:

30.00% 30.00%

Publicador:

Resumo:

My dissertation focuses on two aspects of RNA sequencing technology. The first is the methodology for modeling the overdispersion inherent in RNA-seq data for differential expression analysis. This aspect is addressed in three sections. The second aspect is the application of RNA-seq data to identify the CpG island methylator phenotype (CIMP) by integrating datasets of mRNA expression level and DNA methylation status. Section 1: The cost of DNA sequencing has reduced dramatically in the past decade. Consequently, genomic research increasingly depends on sequencing technology. However it remains elusive how the sequencing capacity influences the accuracy of mRNA expression measurement. We observe that accuracy improves along with the increasing sequencing depth. To model the overdispersion, we use the beta-binomial distribution with a new parameter indicating the dependency between overdispersion and sequencing depth. Our modified beta-binomial model performs better than the binomial or the pure beta-binomial model with a lower false discovery rate. Section 2: Although a number of methods have been proposed in order to accurately analyze differential RNA expression on the gene level, modeling on the base pair level is required. Here, we find that the overdispersion rate decreases as the sequencing depth increases on the base pair level. Also, we propose four models and compare them with each other. As expected, our beta binomial model with a dynamic overdispersion rate is shown to be superior. Section 3: We investigate biases in RNA-seq by exploring the measurement of the external control, spike-in RNA. This study is based on two datasets with spike-in controls obtained from a recent study. We observe an undiscovered bias in the measurement of the spike-in transcripts that arises from the influence of the sample transcripts in RNA-seq. Also, we find that this influence is related to the local sequence of the random hexamer that is used in priming. We suggest a model of the inequality between samples and to correct this type of bias. Section 4: The expression of a gene can be turned off when its promoter is highly methylated. Several studies have reported that a clear threshold effect exists in gene silencing that is mediated by DNA methylation. It is reasonable to assume the thresholds are specific for each gene. It is also intriguing to investigate genes that are largely controlled by DNA methylation. These genes are called “L-shaped” genes. We develop a method to determine the DNA methylation threshold and identify a new CIMP of BRCA. In conclusion, we provide a detailed understanding of the relationship between the overdispersion rate and sequencing depth. And we reveal a new bias in RNA-seq and provide a detailed understanding of the relationship between this new bias and the local sequence. Also we develop a powerful method to dichotomize methylation status and consequently we identify a new CIMP of breast cancer with a distinct classification of molecular characteristics and clinical features.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Sampling was conducted from March 24 to August 5 2010, in the fjord branch Kapisigdlit located in the inner part of the Godthåbsfjord system, West Greenland. The vessel "Lille Masik" was used during all cruises except on June 17-18 where sampling was done from RV Dana (National Institute for Aquatic Resources, Denmark). A total of 15 cruises (of 1-2 days duration) 7-10 days apart was carried out along a transect composed of 6 stations (St.), spanning the length of the 26 km long fjord branch. St. 1 was located at the mouth of the fjord branch and St. 6 was located at the end of the fjord branch, in the middle of a shallower inner creek . St. 1-4 was covering deeper parts of the fjord, and St. 5 was located on the slope leading up to the shallow inner creek. Mesozooplankton was sampled by vertical net tows using a Hydrobios Multinet (type Mini) equipped with a flow meter and 50 µm mesh nets or a WP-2 net 50 µm mesh size equipped with a non-filtering cod-end. Sampling was conducted at various times of day at the different stations. The nets were hauled with a speed of 0.2-0.3 m s**-1 from 100, 75 and 50 m depth to the surface at St. 2 + 4, 5 and 6, respectively. The content was immediately preserved in buffered formalin (4% final concentration). All samples were analyzed in the Plankton sorting and identification center in Szczecin (www.nmfri.gdynia.pl). Samples containing high numbers of zooplankton were split into subsamples. All copepods and other zooplankton were identified down to lowest possible taxonomic level (approx. 400 per sample), length measured and counted. Copepods were sorted into development stages (nauplii stage 1 - copepodite stage 6) using morphological features and sizes, and up to 10 individuals of each stage was length measured.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Geostrophic surface velocities can be derived from the gradients of the mean dynamic topography-the difference between the mean sea surface and the geoid. Therefore, independently observed mean dynamic topography data are valuable input parameters and constraints for ocean circulation models. For a successful fit to observational dynamic topography data, not only the mean dynamic topography on the particular ocean model grid is required, but also information about its inverse covariance matrix. The calculation of the mean dynamic topography from satellite-based gravity field models and altimetric sea surface height measurements, however, is not straightforward. For this purpose, we previously developed an integrated approach to combining these two different observation groups in a consistent way without using the common filter approaches (Becker et al. in J Geodyn 59(60):99-110, 2012, doi:10.1016/j.jog.2011.07.0069; Becker in Konsistente Kombination von Schwerefeld, Altimetrie und hydrographischen Daten zur Modellierung der dynamischen Ozeantopographie, 2012, http://nbn-resolving.de/nbn:de:hbz:5n-29199). Within this combination method, the full spectral range of the observations is considered. Further, it allows the direct determination of the normal equations (i.e., the inverse of the error covariance matrix) of the mean dynamic topography on arbitrary grids, which is one of the requirements for ocean data assimilation. In this paper, we report progress through selection and improved processing of altimetric data sets. We focus on the preprocessing steps of along-track altimetry data from Jason-1 and Envisat to obtain a mean sea surface profile. During this procedure, a rigorous variance propagation is accomplished, so that, for the first time, the full covariance matrix of the mean sea surface is available. The combination of the mean profile and a combined GRACE/GOCE gravity field model yields a mean dynamic topography model for the North Atlantic Ocean that is characterized by a defined set of assumptions. We show that including the geodetically derived mean dynamic topography with the full error structure in a 3D stationary inverse ocean model improves modeled oceanographic features over previous estimates.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Sampling was conducted from March 24 to August 5 2010, in the fjord branch Kapisigdlit located in the inner part of the Godthåbsfjord system, West Greenland. The vessel "Lille Masik" was used during all cruises except on June 17-18 where sampling was done from RV Dana (National Institute for Aquatic Resources, Denmark). A total of 15 cruises (of 1-2 days duration) 7-10 days apart was carried out along a transect composed of 6 stations (St.), spanning the length of the 26 km long fjord branch. St. 1 was located at the mouth of the fjord branch and St. 6 was located at the end of the fjord branch, in the middle of a shallower inner creek . St. 1-4 was covering deeper parts of the fjord, and St. 5 was located on the slope leading up to the shallow inner creek. Mesozooplankton was sampled by vertical net tows using a Hydrobios Multinet (type Mini) equipped with a flow meter and 50 µm mesh nets or a WP-2 net 50 µm mesh size equipped with a non-filtering cod-end. Sampling was conducted at various times of day at the different stations. The nets were hauled with a speed of 0.2-0.3 m s**-1 from 100, 75 and 50 m depth to the surface at St. 2 + 4, 5 and 6, respectively. The content was immediately preserved in buffered formalin (4% final concentration). All samples were analyzed in the Plankton sorting and identification center in Szczecin (www.nmfri.gdynia.pl). Samples containing high numbers of zooplankton were split into subsamples. All copepods and other zooplankton were identified down to lowest possible taxonomic level (approx. 400 per sample), length measured and counted. Copepods were sorted into development stages (nauplii stage 1 - copepodite stage 6) using morphological features and sizes, and up to 10 individuals of each stage was length measured.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In recent years a global increase in jellyfish (i.e. Cnidarians and Ctenophores) abundance and a rise in the recurrence of jellyfish outbreak events have been largely debated, but a general consensus on this matter has not been achieved yet. Within this debate, it has been generally recognized that there is a lack of reliable data that could be analyzed and compared to clarify whether indeed jellyfish are increasing throughout the world ocean as a consequence of anthropogenic impact and hydroclimatic variability. During the G.O. Sars cruise jellyfish were collected at different depths in the 0-1000m layer using a standard 1 m**2 Multiple Opening/Closing Net and Environmental Sensing System (MOCNESS) (quantitative data), Harstad and macroplankton trawls (qualitative data). The comparison of records collected with different nets during the G.O. Sars transatlantic cruise shows that different sampling gears might provide very different information on jellyfish diversity. Indeed, the big trawls mostly collect relatively large scyphozoan and hydrozoan species such as Atolla, Pelagia, Praya, Vogtia, while small hydrozoans (e.g. Clytia, Gilia, Muggiaea) and early stages of ctenophora are only caught by the smaller nets.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The development of the ecosystem approach and models for the management of ocean marine resources requires easy access to standard validated datasets of historical catch data for the main exploited species, together with the model estimates achieved from these data, allowing models inter-comparison and evaluation of model skills. North Atlantic albacore tuna is exploited all year round by longline and in summer and autumn by surface fisheries and fishery statistics compiled by the International Commission for the Conservation of Atlantic Tunas (ICCAT). Catch and effort with geographical coordinates at monthly spatial resolution of 1° or 5° squares were extracted for this species with a careful definition of fisheries and data screening. Length frequencies of catch were also extracted according to the definition of fisheries for the period 1956-2010. Using these data, an application of the spatial ecosystem and population dynamics model (SEAPODYM) was developed for the North Atlantic albacore population and fisheries and provided the first spatially explicit estimate of albacore density in the North Atlantic by life stage. These densities by life stage (larval recruits, young immature fish adult mature fish and total biomass) are provided in gridded file (Netcdf) at resolution of 2° x 2° x month.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Here we present a new, pan-North-Atlantic compilation of data on key mesozooplankton species, including the most important copepod, Calanus finmarchicus. Distributional data of eight representative zooplankton taxa, from recent (2000-2009) Continuous Plankton Recorder data, are presented, along with basin-scale data of the phytoplankton colour index. Then we present a compilation of data on C. finmarchicus, including observations of abundance, demography, egg production and female size, with accompanying data on temperature and chlorophyll. . This is a contribution by Canadian, European and US scientists and their institutions.