988 resultados para Large friction
Resumo:
BACKGROUND: Genotypes obtained with commercial SNP arrays have been extensively used in many large case-control or population-based cohorts for SNP-based genome-wide association studies for a multitude of traits. Yet, these genotypes capture only a small fraction of the variance of the studied traits. Genomic structural variants (GSV) such as Copy Number Variation (CNV) may account for part of the missing heritability, but their comprehensive detection requires either next-generation arrays or sequencing. Sophisticated algorithms that infer CNVs by combining the intensities from SNP-probes for the two alleles can already be used to extract a partial view of such GSV from existing data sets. RESULTS: Here we present several advances to facilitate the latter approach. First, we introduce a novel CNV detection method based on a Gaussian Mixture Model. Second, we propose a new algorithm, PCA merge, for combining copy-number profiles from many individuals into consensus regions. We applied both our new methods as well as existing ones to data from 5612 individuals from the CoLaus study who were genotyped on Affymetrix 500K arrays. We developed a number of procedures in order to evaluate the performance of the different methods. This includes comparison with previously published CNVs as well as using a replication sample of 239 individuals, genotyped with Illumina 550K arrays. We also established a new evaluation procedure that employs the fact that related individuals are expected to share their CNVs more frequently than randomly selected individuals. The ability to detect both rare and common CNVs provides a valuable resource that will facilitate association studies exploring potential phenotypic associations with CNVs. CONCLUSION: Our new methodologies for CNV detection and their evaluation will help in extracting additional information from the large amount of SNP-genotyping data on various cohorts and use this to explore structural variants and their impact on complex traits.
Resumo:
Volumes of data used in science and industry are growing rapidly. When researchers face the challenge of analyzing them, their format is often the first obstacle. Lack of standardized ways of exploring different data layouts requires an effort each time to solve the problem from scratch. Possibility to access data in a rich, uniform manner, e.g. using Structured Query Language (SQL) would offer expressiveness and user-friendliness. Comma-separated values (CSV) are one of the most common data storage formats. Despite its simplicity, with growing file size handling it becomes non-trivial. Importing CSVs into existing databases is time-consuming and troublesome, or even impossible if its horizontal dimension reaches thousands of columns. Most databases are optimized for handling large number of rows rather than columns, therefore, performance for datasets with non-typical layouts is often unacceptable. Other challenges include schema creation, updates and repeated data imports. To address the above-mentioned problems, I present a system for accessing very large CSV-based datasets by means of SQL. It's characterized by: "no copy" approach - data stay mostly in the CSV files; "zero configuration" - no need to specify database schema; written in C++, with boost [1], SQLite [2] and Qt [3], doesn't require installation and has very small size; query rewriting, dynamic creation of indices for appropriate columns and static data retrieval directly from CSV files ensure efficient plan execution; effortless support for millions of columns; due to per-value typing, using mixed text/numbers data is easy; very simple network protocol provides efficient interface for MATLAB and reduces implementation time for other languages. The software is available as freeware along with educational videos on its website [4]. It doesn't need any prerequisites to run, as all of the libraries are included in the distribution package. I test it against existing database solutions using a battery of benchmarks and discuss the results.
Resumo:
The National Academies has stressed the need to develop quantifiable measures for methods that are currently qualitative in nature, such as the examination of fingerprints. Current protocols and procedures to perform these examinations rely heavily on a succession of subjective decisions, from the initial acceptance of evidence for probative value to the final assessment of forensic results. This project studied the concept of sufficiency associated with the decisions made by latent print examiners at the end of the various phases of the examination process. During this 2-year effort, a web‐based interface was designed to capture the observations of 146 latent print examiners and trainees on 15 pairs of latent/control prints. Two main findings resulted from the study: The concept of sufficiency is driven mainly by the number and spatial relationships between the minutiae observed on the latent and control prints. Data indicate that demographics (training, certification, years of experience) or non‐minutiae based features (such as level 3 features) do not play a major role in examiners' decisions; Significant variability was observed between detecting and interpreting friction ridge features and at all levels of details, as well as for factors that have the potential to influence the examination process, such as degradation, distortion, or influence of the background and the development technique.
Resumo:
In the vast majority of bottom-up proteomics studies, protein digestion is performed using only mammalian trypsin. Although it is clearly the best enzyme available, the sole use of trypsin rarely leads to complete sequence coverage, even for abundant proteins. It is commonly assumed that this is because many tryptic peptides are either too short or too long to be identified by RPLC-MS/MS. We show through in silico analysis that 20-30% of the total sequence of three proteomes (Schizosaccharomyces pombe, Saccharomyces cerevisiae, and Homo sapiens) is expected to be covered by Large post-Trypsin Peptides (LpTPs) with M(r) above 3000 Da. We then established size exclusion chromatography to fractionate complex yeast tryptic digests into pools of peptides based on size. We found that secondary digestion of LpTPs followed by LC-MS/MS analysis leads to a significant increase in identified proteins and a 32-50% relative increase in average sequence coverage compared to trypsin digestion alone. Application of the developed strategy to analyze the phosphoproteomes of S. pombe and of a human cell line identified a significant fraction of novel phosphosites. Overall our data indicate that specific targeting of LpTPs can complement standard bottom-up workflows to reveal a largely neglected portion of the proteome.
Resumo:
Introduction: Diffuse large B-cell lymphomas (DLBCL) represent a heterogeneous disease with variable clinical outcome. Identifying phenotypic biomarkers of tumor cells on paraffin sections that predict different clinical outcome remain an important goal that may also help to better understand the biology of this lymphoma. Differentiating non-germinal centre B-cell-like (non-GCB) from Germinal Centre B-cell-like (GCB) DLBCL according to Hans algorithm has been considered as an important immunohistochemical biomarker with prognostic value among patients treated with R-CHOP although not reproducibly found by all groups. Gene expression studies have also shown that IgM expression might be used as a surrogate for the GCB and ABC subtypes with a strong preferential expression of IgM in ABC DLBCL subtype. ImmunoFISH index based on the differential expression of MUM-1, FOXP1 by immunohistochemistry and on the BCL6 rearrangement by FISH has been previously reported (C Copie-Bergman, J Clin Oncol. 2009;27:5573-9) as prognostic in an homogeneous series of DLBCL treated with R-CHOP. In addition, oncogenic MYC protein overexpression by immunohistochemistry may represent an easy tool to identify the consequences of MYC deregulation in DLBCL. Our aim was to analyse by immunohistochemistry the prognostic relevance of MYC, IgM, GCB/nonGCB subtype and ImmunoFISH index in a large series of de novo DLBCL treated with Rituximab (R)-chemotherapy (anthracyclin based) included in the 2003 program of the Groupe d'Etude des Lymphomes de l'Adulte (GELA) trials. Methods: The 2003 program included patients with de novo CD20+ DLBCL enrolled in 6 different LNH-03 GELA trials (LNH-03-1B, -B, -3B, 39B, -6B, 7B) stratifying patients according to age and age-adjusted IPI. Tumor samples were analyzed by immunohistochemistry using CD10, BCL6, MUM1, FOXP1 (according to Barrans threshold), MYC, IgM antibodies on tissue microarrays and by FISH using BCL6 split signal DNA probes. Considering evaluable Hans score, 670 patients were included in the study with 237 (35.4%) receiving intensive R-ACVBP regimen and 433 (64.6%) R-CHOP/R-mini-CHOP. Results: 304 (45.4%) DLBCL were classified as GCB and 366 (54.6%) as non-GCB according to Hans algorithm. 337/567 cases (59.4%) were positive for the ImmunoFISH index (i.e. two out of the three markers positive: MUM1 protein positive, FOXP1 protein Variable or Strong, BCL6 rearrangement). Immunofish index was preferentially positive in the non-GCB subtype (81.3%) compared to the GCB subtype (31.2%), (p<0.001). IgM was recorded as positive in tumor cells in 351/637 (52.4%) DLBCL cases with a preferential expression in non-GCB 195 (53.3%) vs GCB subtype 100(32.9%), p<0.001). MYC was positive in 170/577 (29.5%) cases with a 40% cut-off and in 44/577 (14.2%) cases with a cut-off of 70%. There was no preferential expression of MYC among GCB or non-GCB subtype (p>0.4) for both cut-offs. Progression-free Survival (PFS) was significantly worse among patients with high IPI score (p<0.0001), IgM positive tumor (p<0.0001), MYC positive tumor with a 40% threshold (p<0.001), ImmunoFISH positive index (p<0.002), non-GCB DLBCL subtype (p<0.0001). Overall Survival (OS) was also significantly worse among patients with high IPI score (p<0.0001), IgM positive tumor (p=0.02), MYC positive tumor with a 40% threshold (p<0.01), ImmunoFISH positive index (p=0.02), non-GCB DLBCL subtype (p<0.0001). All significant parameters were included in a multivariate analysis using Cox Model and in addition to IPI, only the GCB/non-GCB subtype according to Hans algorithm predicted significantly a worse PFS among non-GCB subgroup (HR 1.9 [1.3-2.8] p=0.002) as well as a worse OS (HR 2.0 [1.3-3.2], p=0.003). This strong prognostic value of non-GCB subtyping was confirmed considering only patients treated with R- CHOP for PFS (HR 2.1 [1.4-3.3], p=0.001) and for OS (HR 2.3 [1.3-3.8], p=0.002). Conclusion: Our study on a large series of patients included in trials confirmed the relevance of immunohistochemistry as a useful tool to identify significant prognostic biomarkers for clinical use. We show here that IgM and MYC might be useful prognostic biomarkers. In addition, we confirmed in this series the prognostic value of the ImmunoFISH index. Above all, we fully validated the strong and independent prognostic value of the Hans algorithm, daily used by the pathologists to subtype DLBCL.
Resumo:
This paper is devoted to prove a large-deviation principle for solutions to multidimensional stochastic Volterra equations.