201 resultados para large vesicles
Resumo:
MOTIVATION: Analysis of millions of pyro-sequences is currently playing a crucial role in the advance of environmental microbiology. Taxonomy-independent, i.e. unsupervised, clustering of these sequences is essential for the definition of Operational Taxonomic Units. For this application, reproducibility and robustness should be the most sought after qualities, but have thus far largely been overlooked. RESULTS: More than 1 million hyper-variable internal transcribed spacer 1 (ITS1) sequences of fungal origin have been analyzed. The ITS1 sequences were first properly extracted from 454 reads using generalized profiles. Then, otupipe, cd-hit-454, ESPRIT-Tree and DBC454, a new algorithm presented here, were used to analyze the sequences. A numerical assay was developed to measure the reproducibility and robustness of these algorithms. DBC454 was the most robust, closely followed by ESPRIT-Tree. DBC454 features density-based hierarchical clustering, which complements the other methods by providing insights into the structure of the data. AVAILABILITY: An executable is freely available for non-commercial users at ftp://ftp.vital-it.ch/tools/dbc454. It is designed to run under MPI on a cluster of 64-bit Linux machines running Red Hat 4.x, or on a multi-core OSX system. CONTACT: dbc454@vital-it.ch or nicolas.guex@isb-sib.ch.
Resumo:
For the last 2 decades, supertree reconstruction has been an active field of research and has seen the development of a large number of major algorithms. Because of the growing popularity of the supertree methods, it has become necessary to evaluate the performance of these algorithms to determine which are the best options (especially with regard to the supermatrix approach that is widely used). In this study, seven of the most commonly used supertree methods are investigated by using a large empirical data set (in terms of number of taxa and molecular markers) from the worldwide flowering plant family Sapindaceae. Supertree methods were evaluated using several criteria: similarity of the supertrees with the input trees, similarity between the supertrees and the total evidence tree, level of resolution of the supertree and computational time required by the algorithm. Additional analyses were also conducted on a reduced data set to test if the performance levels were affected by the heuristic searches rather than the algorithms themselves. Based on our results, two main groups of supertree methods were identified: on one hand, the matrix representation with parsimony (MRP), MinFlip, and MinCut methods performed well according to our criteria, whereas the average consensus, split fit, and most similar supertree methods showed a poorer performance or at least did not behave the same way as the total evidence tree. Results for the super distance matrix, that is, the most recent approach tested here, were promising with at least one derived method performing as well as MRP, MinFlip, and MinCut. The output of each method was only slightly improved when applied to the reduced data set, suggesting a correct behavior of the heuristic searches and a relatively low sensitivity of the algorithms to data set sizes and missing data. Results also showed that the MRP analyses could reach a high level of quality even when using a simple heuristic search strategy, with the exception of MRP with Purvis coding scheme and reversible parsimony. The future of supertrees lies in the implementation of a standardized heuristic search for all methods and the increase in computing power to handle large data sets. The latter would prove to be particularly useful for promising approaches such as the maximum quartet fit method that yet requires substantial computing power.
Resumo:
BACKGROUND: Genotypes obtained with commercial SNP arrays have been extensively used in many large case-control or population-based cohorts for SNP-based genome-wide association studies for a multitude of traits. Yet, these genotypes capture only a small fraction of the variance of the studied traits. Genomic structural variants (GSV) such as Copy Number Variation (CNV) may account for part of the missing heritability, but their comprehensive detection requires either next-generation arrays or sequencing. Sophisticated algorithms that infer CNVs by combining the intensities from SNP-probes for the two alleles can already be used to extract a partial view of such GSV from existing data sets. RESULTS: Here we present several advances to facilitate the latter approach. First, we introduce a novel CNV detection method based on a Gaussian Mixture Model. Second, we propose a new algorithm, PCA merge, for combining copy-number profiles from many individuals into consensus regions. We applied both our new methods as well as existing ones to data from 5612 individuals from the CoLaus study who were genotyped on Affymetrix 500K arrays. We developed a number of procedures in order to evaluate the performance of the different methods. This includes comparison with previously published CNVs as well as using a replication sample of 239 individuals, genotyped with Illumina 550K arrays. We also established a new evaluation procedure that employs the fact that related individuals are expected to share their CNVs more frequently than randomly selected individuals. The ability to detect both rare and common CNVs provides a valuable resource that will facilitate association studies exploring potential phenotypic associations with CNVs. CONCLUSION: Our new methodologies for CNV detection and their evaluation will help in extracting additional information from the large amount of SNP-genotyping data on various cohorts and use this to explore structural variants and their impact on complex traits.
Resumo:
Volumes of data used in science and industry are growing rapidly. When researchers face the challenge of analyzing them, their format is often the first obstacle. Lack of standardized ways of exploring different data layouts requires an effort each time to solve the problem from scratch. Possibility to access data in a rich, uniform manner, e.g. using Structured Query Language (SQL) would offer expressiveness and user-friendliness. Comma-separated values (CSV) are one of the most common data storage formats. Despite its simplicity, with growing file size handling it becomes non-trivial. Importing CSVs into existing databases is time-consuming and troublesome, or even impossible if its horizontal dimension reaches thousands of columns. Most databases are optimized for handling large number of rows rather than columns, therefore, performance for datasets with non-typical layouts is often unacceptable. Other challenges include schema creation, updates and repeated data imports. To address the above-mentioned problems, I present a system for accessing very large CSV-based datasets by means of SQL. It's characterized by: "no copy" approach - data stay mostly in the CSV files; "zero configuration" - no need to specify database schema; written in C++, with boost [1], SQLite [2] and Qt [3], doesn't require installation and has very small size; query rewriting, dynamic creation of indices for appropriate columns and static data retrieval directly from CSV files ensure efficient plan execution; effortless support for millions of columns; due to per-value typing, using mixed text/numbers data is easy; very simple network protocol provides efficient interface for MATLAB and reduces implementation time for other languages. The software is available as freeware along with educational videos on its website [4]. It doesn't need any prerequisites to run, as all of the libraries are included in the distribution package. I test it against existing database solutions using a battery of benchmarks and discuss the results.
Resumo:
In the vast majority of bottom-up proteomics studies, protein digestion is performed using only mammalian trypsin. Although it is clearly the best enzyme available, the sole use of trypsin rarely leads to complete sequence coverage, even for abundant proteins. It is commonly assumed that this is because many tryptic peptides are either too short or too long to be identified by RPLC-MS/MS. We show through in silico analysis that 20-30% of the total sequence of three proteomes (Schizosaccharomyces pombe, Saccharomyces cerevisiae, and Homo sapiens) is expected to be covered by Large post-Trypsin Peptides (LpTPs) with M(r) above 3000 Da. We then established size exclusion chromatography to fractionate complex yeast tryptic digests into pools of peptides based on size. We found that secondary digestion of LpTPs followed by LC-MS/MS analysis leads to a significant increase in identified proteins and a 32-50% relative increase in average sequence coverage compared to trypsin digestion alone. Application of the developed strategy to analyze the phosphoproteomes of S. pombe and of a human cell line identified a significant fraction of novel phosphosites. Overall our data indicate that specific targeting of LpTPs can complement standard bottom-up workflows to reveal a largely neglected portion of the proteome.
Resumo:
Introduction: Diffuse large B-cell lymphomas (DLBCL) represent a heterogeneous disease with variable clinical outcome. Identifying phenotypic biomarkers of tumor cells on paraffin sections that predict different clinical outcome remain an important goal that may also help to better understand the biology of this lymphoma. Differentiating non-germinal centre B-cell-like (non-GCB) from Germinal Centre B-cell-like (GCB) DLBCL according to Hans algorithm has been considered as an important immunohistochemical biomarker with prognostic value among patients treated with R-CHOP although not reproducibly found by all groups. Gene expression studies have also shown that IgM expression might be used as a surrogate for the GCB and ABC subtypes with a strong preferential expression of IgM in ABC DLBCL subtype. ImmunoFISH index based on the differential expression of MUM-1, FOXP1 by immunohistochemistry and on the BCL6 rearrangement by FISH has been previously reported (C Copie-Bergman, J Clin Oncol. 2009;27:5573-9) as prognostic in an homogeneous series of DLBCL treated with R-CHOP. In addition, oncogenic MYC protein overexpression by immunohistochemistry may represent an easy tool to identify the consequences of MYC deregulation in DLBCL. Our aim was to analyse by immunohistochemistry the prognostic relevance of MYC, IgM, GCB/nonGCB subtype and ImmunoFISH index in a large series of de novo DLBCL treated with Rituximab (R)-chemotherapy (anthracyclin based) included in the 2003 program of the Groupe d'Etude des Lymphomes de l'Adulte (GELA) trials. Methods: The 2003 program included patients with de novo CD20+ DLBCL enrolled in 6 different LNH-03 GELA trials (LNH-03-1B, -B, -3B, 39B, -6B, 7B) stratifying patients according to age and age-adjusted IPI. Tumor samples were analyzed by immunohistochemistry using CD10, BCL6, MUM1, FOXP1 (according to Barrans threshold), MYC, IgM antibodies on tissue microarrays and by FISH using BCL6 split signal DNA probes. Considering evaluable Hans score, 670 patients were included in the study with 237 (35.4%) receiving intensive R-ACVBP regimen and 433 (64.6%) R-CHOP/R-mini-CHOP. Results: 304 (45.4%) DLBCL were classified as GCB and 366 (54.6%) as non-GCB according to Hans algorithm. 337/567 cases (59.4%) were positive for the ImmunoFISH index (i.e. two out of the three markers positive: MUM1 protein positive, FOXP1 protein Variable or Strong, BCL6 rearrangement). Immunofish index was preferentially positive in the non-GCB subtype (81.3%) compared to the GCB subtype (31.2%), (p<0.001). IgM was recorded as positive in tumor cells in 351/637 (52.4%) DLBCL cases with a preferential expression in non-GCB 195 (53.3%) vs GCB subtype 100(32.9%), p<0.001). MYC was positive in 170/577 (29.5%) cases with a 40% cut-off and in 44/577 (14.2%) cases with a cut-off of 70%. There was no preferential expression of MYC among GCB or non-GCB subtype (p>0.4) for both cut-offs. Progression-free Survival (PFS) was significantly worse among patients with high IPI score (p<0.0001), IgM positive tumor (p<0.0001), MYC positive tumor with a 40% threshold (p<0.001), ImmunoFISH positive index (p<0.002), non-GCB DLBCL subtype (p<0.0001). Overall Survival (OS) was also significantly worse among patients with high IPI score (p<0.0001), IgM positive tumor (p=0.02), MYC positive tumor with a 40% threshold (p<0.01), ImmunoFISH positive index (p=0.02), non-GCB DLBCL subtype (p<0.0001). All significant parameters were included in a multivariate analysis using Cox Model and in addition to IPI, only the GCB/non-GCB subtype according to Hans algorithm predicted significantly a worse PFS among non-GCB subgroup (HR 1.9 [1.3-2.8] p=0.002) as well as a worse OS (HR 2.0 [1.3-3.2], p=0.003). This strong prognostic value of non-GCB subtyping was confirmed considering only patients treated with R- CHOP for PFS (HR 2.1 [1.4-3.3], p=0.001) and for OS (HR 2.3 [1.3-3.8], p=0.002). Conclusion: Our study on a large series of patients included in trials confirmed the relevance of immunohistochemistry as a useful tool to identify significant prognostic biomarkers for clinical use. We show here that IgM and MYC might be useful prognostic biomarkers. In addition, we confirmed in this series the prognostic value of the ImmunoFISH index. Above all, we fully validated the strong and independent prognostic value of the Hans algorithm, daily used by the pathologists to subtype DLBCL.
Resumo:
Background a nd A ims: T he 2 007 ECCO g uidelines o nanemia in inflammatory bowel disease (IBD) favour intravenous(iv) over oral (po) i ron supplementation due to bettereffectiveness and tolerance. Application of guidelines in clinicalpractice m ay r equire time. We a imed to determine thepercentage of IBD patients under iron supplementation therapyand its application mode over time in a large IBD cohort.Methods: Helsana, a leading Swiss health insurance companyprovides c overage f or approximately 18% of t he Swisspopulation, corresponding to about 1.2 million enrollees.Patients with Crohn's disease (CD) and ulcerative colitis (UC)were identified b y keyword search from t he a nonymisedHelsana database.Results: I n total, 6 29 CD ( 61% female) a nd 4 03 UC ( 56%female) patients w ere identified, mean retrospectiveobservation time w as 2 0.4 m onths f or CD and 13 m onths f orUC patients. Of t he entire study population, 29.3% wereprescribed iron. O ccurrence of iron prescription was 21.3% inmales a nd 31.2% in f emales ( odds r atio [OR] 1 .69, 95%-confidence interval [CI] 1.26-2.28). The prescription of iv i ronincreased from 2006/2007 ( 48.8% w ith iv i ron) to 2 008/2009(65.2% with iv iron) by a factor of 1.89.Conclusions: One third of the IBD population was treated withiron supplementation. A gradual s hift from oral t o iv iron wasobserved over time in a large Swiss IBD cohort. This switch inprescription habits g oes a long with the implementation of theECCO consensus guidelines on anemia in IBD.
Resumo:
We produced three monoclonal antibodies, BF7, GE2 and CG12, against cultured human glioma cells. Their specificity was tested by an indirect antibody-binding radioimmunoassay on a panel of glial and non-glial tumor cell lines. BF7 and GE2 react preferentially with glioma cells and, except for one colon carcinoma line, they do not bind to the control non-neuroectodermal cells; they appear to be directed against common malignant glioma associated antigens. CG12, the third monoclonal antibody, binds to the great majority of tumor cell lines of neuroectodermal origin and does not bind to any other cell lines tested.