19 resultados para Dataset

em Duke University


Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper describes a methodology for detecting anomalies from sequentially observed and potentially noisy data. The proposed approach consists of two main elements: 1) filtering, or assigning a belief or likelihood to each successive measurement based upon our ability to predict it from previous noisy observations and 2) hedging, or flagging potential anomalies by comparing the current belief against a time-varying and data-adaptive threshold. The threshold is adjusted based on the available feedback from an end user. Our algorithms, which combine universal prediction with recent work on online convex programming, do not require computing posterior distributions given all current observations and involve simple primal-dual parameter updates. At the heart of the proposed approach lie exponential-family models which can be used in a wide variety of contexts and applications, and which yield methods that achieve sublinear per-round regret against both static and slowly varying product distributions with marginals drawn from the same exponential family. Moreover, the regret against static distributions coincides with the minimax value of the corresponding online strongly convex game. We also prove bounds on the number of mistakes made during the hedging step relative to the best offline choice of the threshold with access to all estimated beliefs and feedback signals. We validate the theory on synthetic data drawn from a time-varying distribution over binary vectors of high dimensionality, as well as on the Enron email dataset. © 1963-2012 IEEE.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Measuring the entorhinal cortex (ERC) is challenging due to lateral border discrimination from the perirhinal cortex. From a sample of 39 nondemented older adults who completed volumetric image scans and verbal memory indices, we examined reliability and validity concerns for three ERC protocols with different lateral boundary guidelines (i.e., Goncharova, Dickerson, Stoub, & deToledo-Morrell, 2001; Honeycutt et al., 1998; Insausti et al., 1998). We used three novice raters to assess inter-rater reliability on a subset of scans (216 total ERCs), with the entire dataset measured by one rater with strong intra-rater reliability on each technique (234 total ERCs). We found moderate to strong inter-rater reliability for two techniques with consistent ERC lateral boundary endpoints (Goncharova, Honeycutt), with negligible to moderate reliability for the technique requiring consideration of collateral sulcal depth (Insausti). Left ERC and story memory associations were moderate and positive for two techniques designed to exclude the perirhinal cortex (Insausti, Goncharova), with the Insausti technique continuing to explain 10% of memory score variance after additionally controlling for depression symptom severity. Right ERC-story memory associations were nonexistent after excluding an outlier. Researchers are encouraged to consider challenges of rater training for ERC techniques and how lateral boundary endpoints may impact structure-function associations.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

BACKGROUND: Invasive fungal infections (IFIs) are a major cause of morbidity and mortality among organ transplant recipients. Multicenter prospective surveillance data to determine disease burden and secular trends are lacking. METHODS: The Transplant-Associated Infection Surveillance Network (TRANSNET) is a consortium of 23 US transplant centers, including 15 that contributed to the organ transplant recipient dataset. We prospectively identified IFIs among organ transplant recipients from March, 2001 through March, 2006 at these sites. To explore trends, we calculated the 12-month cumulative incidence among 9 sequential cohorts. RESULTS: During the surveillance period, 1208 IFIs were identified among 1063 organ transplant recipients. The most common IFIs were invasive candidiasis (53%), invasive aspergillosis (19%), cryptococcosis (8%), non-Aspergillus molds (8%), endemic fungi (5%), and zygomycosis (2%). Median time to onset of candidiasis, aspergillosis, and cryptococcosis was 103, 184, and 575 days, respectively. Among a cohort of 16,808 patients who underwent transplantation between March 2001 and September 2005 and were followed through March 2006, a total of 729 IFIs were reported among 633 persons. One-year cumulative incidences of the first IFI were 11.6%, 8.6%, 4.7%, 4.0%, 3.4%, and 1.3% for small bowel, lung, liver, heart, pancreas, and kidney transplant recipients, respectively. One-year incidence was highest for invasive candidiasis (1.95%) and aspergillosis (0.65%). Trend analysis showed a slight increase in cumulative incidence from 2002 to 2005. CONCLUSIONS: We detected a slight increase in IFIs during the surveillance period. These data provide important insights into the timing and incidence of IFIs among organ transplant recipients, which can help to focus effective prevention and treatment strategies.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

BACKGROUND: Over the past two decades more than fifty thousand unique clinical and biological samples have been assayed using the Affymetrix HG-U133 and HG-U95 GeneChip microarray platforms. This substantial repository has been used extensively to characterize changes in gene expression between biological samples, but has not been previously mined en masse for changes in mRNA processing. We explored the possibility of using HG-U133 microarray data to identify changes in alternative mRNA processing in several available archival datasets. RESULTS: Data from these and other gene expression microarrays can now be mined for changes in transcript isoform abundance using a program described here, SplicerAV. Using in vivo and in vitro breast cancer microarray datasets, SplicerAV was able to perform both gene and isoform specific expression profiling within the same microarray dataset. Our reanalysis of Affymetrix U133 plus 2.0 data generated by in vitro over-expression of HRAS, E2F3, beta-catenin (CTNNB1), SRC, and MYC identified several hundred oncogene-induced mRNA isoform changes, one of which recognized a previously unknown mechanism of EGFR family activation. Using clinical data, SplicerAV predicted 241 isoform changes between low and high grade breast tumors; with changes enriched among genes coding for guanyl-nucleotide exchange factors, metalloprotease inhibitors, and mRNA processing factors. Isoform changes in 15 genes were associated with aggressive cancer across the three breast cancer datasets. CONCLUSIONS: Using SplicerAV, we identified several hundred previously uncharacterized isoform changes induced by in vitro oncogene over-expression and revealed a previously unknown mechanism of EGFR activation in human mammary epithelial cells. We analyzed Affymetrix GeneChip data from over 400 human breast tumors in three independent studies, making this the largest clinical dataset analyzed for en masse changes in alternative mRNA processing. The capacity to detect RNA isoform changes in archival microarray data using SplicerAV allowed us to carry out the first analysis of isoform specific mRNA changes directly associated with cancer survival.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

BACKGROUND: Biological processes occur on a vast range of time scales, and many of them occur concurrently. As a result, system-wide measurements of gene expression have the potential to capture many of these processes simultaneously. The challenge however, is to separate these processes and time scales in the data. In many cases the number of processes and their time scales is unknown. This issue is particularly relevant to developmental biologists, who are interested in processes such as growth, segmentation and differentiation, which can all take place simultaneously, but on different time scales. RESULTS: We introduce a flexible and statistically rigorous method for detecting different time scales in time-series gene expression data, by identifying expression patterns that are temporally shifted between replicate datasets. We apply our approach to a Saccharomyces cerevisiae cell-cycle dataset and an Arabidopsis thaliana root developmental dataset. In both datasets our method successfully detects processes operating on several different time scales. Furthermore we show that many of these time scales can be associated with particular biological functions. CONCLUSIONS: The spatiotemporal modules identified by our method suggest the presence of multiple biological processes, acting at distinct time scales in both the Arabidopsis root and yeast. Using similar large-scale expression datasets, the identification of biological processes acting at multiple time scales in many organisms is now possible.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Tumor microenvironmental stresses, such as hypoxia and lactic acidosis, play important roles in tumor progression. Although gene signatures reflecting the influence of these stresses are powerful approaches to link expression with phenotypes, they do not fully reflect the complexity of human cancers. Here, we describe the use of latent factor models to further dissect the stress gene signatures in a breast cancer expression dataset. The genes in these latent factors are coordinately expressed in tumors and depict distinct, interacting components of the biological processes. The genes in several latent factors are highly enriched in chromosomal locations. When these factors are analyzed in independent datasets with gene expression and array CGH data, the expression values of these factors are highly correlated with copy number alterations (CNAs) of the corresponding BAC clones in both the cell lines and tumors. Therefore, variation in the expression of these pathway-associated factors is at least partially caused by variation in gene dosage and CNAs among breast cancers. We have also found the expression of two latent factors without any chromosomal enrichment is highly associated with 12q CNA, likely an instance of "trans"-variations in which CNA leads to the variations in gene expression outside of the CNA region. In addition, we have found that factor 26 (1q CNA) is negatively correlated with HIF-1alpha protein and hypoxia pathways in breast tumors and cell lines. This agrees with, and for the first time links, known good prognosis associated with both a low hypoxia signature and the presence of CNA in this region. Taken together, these results suggest the possibility that tumor segmental aneuploidy makes significant contributions to variation in the lactic acidosis/hypoxia gene signatures in human cancers and demonstrate that latent factor analysis is a powerful means to uncover such a linkage.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

A steady increase in knowledge of the molecular and antigenic structure of the gp120 and gp41 HIV-1 envelope glycoproteins (Env) is yielding important new insights for vaccine design, but it has been difficult to translate this information to an immunogen that elicits broadly neutralizing antibodies. To help bridge this gap, we used phylogenetically corrected statistical methods to identify amino acid signature patterns in Envs derived from people who have made potently neutralizing antibodies, with the hypothesis that these Envs may share common features that would be useful for incorporation in a vaccine immunogen. Before attempting this, essentially as a control, we explored the utility of our computational methods for defining signatures of complex neutralization phenotypes by analyzing Env sequences from 251 clonal viruses that were differentially sensitive to neutralization by the well-characterized gp120-specific monoclonal antibody, b12. We identified ten b12-neutralization signatures, including seven either in the b12-binding surface of gp120 or in the V2 region of gp120 that have been previously shown to impact b12 sensitivity. A simple algorithm based on the b12 signature pattern was predictive of b12 sensitivity/resistance in an additional blinded panel of 57 viruses. Upon obtaining these reassuring outcomes, we went on to apply these same computational methods to define signature patterns in Env from HIV-1 infected individuals who had potent, broadly neutralizing responses. We analyzed a checkerboard-style neutralization dataset with sera from 69 HIV-1-infected individuals tested against a panel of 25 different Envs. Distinct clusters of sera with high and low neutralization potencies were identified. Six signature positions in Env sequences obtained from the 69 samples were found to be strongly associated with either the high or low potency responses. Five sites were in the CD4-induced coreceptor binding site of gp120, suggesting an important role for this region in the elicitation of broadly neutralizing antibody responses against HIV-1.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

BACKGROUND: A major challenge in oncology is the selection of the most effective chemotherapeutic agents for individual patients, while the administration of ineffective chemotherapy increases mortality and decreases quality of life in cancer patients. This emphasizes the need to evaluate every patient's probability of responding to each chemotherapeutic agent and limiting the agents used to those most likely to be effective. METHODS AND RESULTS: Using gene expression data on the NCI-60 and corresponding drug sensitivity, mRNA and microRNA profiles were developed representing sensitivity to individual chemotherapeutic agents. The mRNA signatures were tested in an independent cohort of 133 breast cancer patients treated with the TFAC (paclitaxel, 5-fluorouracil, adriamycin, and cyclophosphamide) chemotherapy regimen. To further dissect the biology of resistance, we applied signatures of oncogenic pathway activation and performed hierarchical clustering. We then used mRNA signatures of chemotherapy sensitivity to identify alternative therapeutics for patients resistant to TFAC. Profiles from mRNA and microRNA expression data represent distinct biologic mechanisms of resistance to common cytotoxic agents. The individual mRNA signatures were validated in an independent dataset of breast tumors (P = 0.002, NPV = 82%). When the accuracy of the signatures was analyzed based on molecular variables, the predictive ability was found to be greater in basal-like than non basal-like patients (P = 0.03 and P = 0.06). Samples from patients with co-activated Myc and E2F represented the cohort with the lowest percentage (8%) of responders. Using mRNA signatures of sensitivity to other cytotoxic agents, we predict that TFAC non-responders are more likely to be sensitive to docetaxel (P = 0.04), representing a viable alternative therapy. CONCLUSIONS: Our results suggest that the optimal strategy for chemotherapy sensitivity prediction integrates molecular variables such as ER and HER2 status with corresponding microRNA and mRNA expression profiles. Importantly, we also present evidence to support the concept that analysis of molecular variables can present a rational strategy to identifying alternative therapeutic opportunities.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

While genome-wide gene expression data are generated at an increasing rate, the repertoire of approaches for pattern discovery in these data is still limited. Identifying subtle patterns of interest in large amounts of data (tens of thousands of profiles) associated with a certain level of noise remains a challenge. A microarray time series was recently generated to study the transcriptional program of the mouse segmentation clock, a biological oscillator associated with the periodic formation of the segments of the body axis. A method related to Fourier analysis, the Lomb-Scargle periodogram, was used to detect periodic profiles in the dataset, leading to the identification of a novel set of cyclic genes associated with the segmentation clock. Here, we applied to the same microarray time series dataset four distinct mathematical methods to identify significant patterns in gene expression profiles. These methods are called: Phase consistency, Address reduction, Cyclohedron test and Stable persistence, and are based on different conceptual frameworks that are either hypothesis- or data-driven. Some of the methods, unlike Fourier transforms, are not dependent on the assumption of periodicity of the pattern of interest. Remarkably, these methods identified blindly the expression profiles of known cyclic genes as the most significant patterns in the dataset. Many candidate genes predicted by more than one approach appeared to be true positive cyclic genes and will be of particular interest for future research. In addition, these methods predicted novel candidate cyclic genes that were consistent with previous biological knowledge and experimental validation in mouse embryos. Our results demonstrate the utility of these novel pattern detection strategies, notably for detection of periodic profiles, and suggest that combining several distinct mathematical approaches to analyze microarray datasets is a valuable strategy for identifying genes that exhibit novel, interesting transcriptional patterns.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Aquifer denitrification is among the most poorly constrained fluxes in global and regional nitrogen budgets. The few direct measurements of denitrification in groundwaters provide limited information about its spatial and temporal variability, particularly at the scale of whole aquifers. Uncertainty in estimates of denitrification may also lead to underestimates of its effect on isotopic signatures of inorganic N, and thereby confound the inference of N source from these data. In this study, our objectives are to quantify the magnitude and variability of denitrification in the Upper Floridan Aquifer (UFA) and evaluate its effect on N isotopic signatures at the regional scale. Using dual noble gas tracers (Ne, Ar) to generate physical predictions of N2 gas concentrations for 112 observations from 61 UFA springs, we show that excess (i.e. denitrification-derived) N2 is highly variable in space and inversely correlated with dissolved oxygen (O2). Negative relationships between O2 and δ15N NO3 across a larger dataset of 113 springs, well-constrained isotopic fractionation coefficients, and strong 15N:18O covariation further support inferences of denitrification in this uniquely organic-matter-poor system. Despite relatively low average rates, denitrification accounted for 32 % of estimated aquifer N inputs across all sampled UFA springs. Back-calculations of source δ15N NO3 based on denitrification progression suggest that isotopically-enriched nitrate (NO3-) in many springs of the UFA reflects groundwater denitrification rather than urban- or animal-derived inputs. © Author(s) 2012.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

BACKGROUND: We analyzed the association between 53 genes related to DNA repair and p53-mediated damage response and serous ovarian cancer risk using case-control data from the North Carolina Ovarian Cancer Study (NCOCS), a population-based, case-control study. METHODS/PRINCIPAL FINDINGS: The analysis was restricted to 364 invasive serous ovarian cancer cases and 761 controls of white, non-Hispanic race. Statistical analysis was two staged: a screen using marginal Bayes factors (BFs) for 484 SNPs and a modeling stage in which we calculated multivariate adjusted posterior probabilities of association for 77 SNPs that passed the screen. These probabilities were conditional on subject age at diagnosis/interview, batch, a DNA quality metric and genotypes of other SNPs and allowed for uncertainty in the genetic parameterizations of the SNPs and number of associated SNPs. Six SNPs had Bayes factors greater than 10 in favor of an association with invasive serous ovarian cancer. These included rs5762746 (median OR(odds ratio)(per allele) = 0.66; 95% credible interval (CI) = 0.44-1.00) and rs6005835 (median OR(per allele) = 0.69; 95% CI = 0.53-0.91) in CHEK2, rs2078486 (median OR(per allele) = 1.65; 95% CI = 1.21-2.25) and rs12951053 (median OR(per allele) = 1.65; 95% CI = 1.20-2.26) in TP53, rs411697 (median OR (rare homozygote) = 0.53; 95% CI = 0.35 - 0.79) in BACH1 and rs10131 (median OR( rare homozygote) = not estimable) in LIG4. The six most highly associated SNPs are either predicted to be functionally significant or are in LD with such a variant. The variants in TP53 were confirmed to be associated in a large follow-up study. CONCLUSIONS/SIGNIFICANCE: Based on our findings, further follow-up of the DNA repair and response pathways in a larger dataset is warranted to confirm these results.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Gaussian factor models have proven widely useful for parsimoniously characterizing dependence in multivariate data. There is a rich literature on their extension to mixed categorical and continuous variables, using latent Gaussian variables or through generalized latent trait models acommodating measurements in the exponential family. However, when generalizing to non-Gaussian measured variables the latent variables typically influence both the dependence structure and the form of the marginal distributions, complicating interpretation and introducing artifacts. To address this problem we propose a novel class of Bayesian Gaussian copula factor models which decouple the latent factors from the marginal distributions. A semiparametric specification for the marginals based on the extended rank likelihood yields straightforward implementation and substantial computational gains. We provide new theoretical and empirical justifications for using this likelihood in Bayesian inference. We propose new default priors for the factor loadings and develop efficient parameter-expanded Gibbs sampling for posterior computation. The methods are evaluated through simulations and applied to a dataset in political science. The models in this paper are implemented in the R package bfa.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The third wave of the National Congregations Study (NCS-III) was conducted in 2012. The 2012 General Social Survey asked respondents who attend religious services to name their religious congregation, producing a nationally representative cross-section of congregations from across the religious spectrum. Data about these congregations was collected via a 50-minute interview with one key informant from 1,331 congregations. Information was gathered about multiple aspects of congregations’ social composition, structure, activities, and programming. Approximately two-thirds of the NCS-III questionnaire replicates items from 1998 or 2006-07 NCS waves. Each congregation was geocoded, and selected data from the 2010 United States census or American Community Survey have been appended. We describe NCS-III methodology and use the cumulative NCS dataset (containing 4,071 cases) to describe five trends: more ethnic diversity, greater acceptance of gays and lesbians, increasingly informal worship styles, declining size (but not from the perspective of the average attendee), and declining denominational affiliation.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The tendency for island populations of mammalian taxa to diverge in body size from their mainland counterparts consistently in particular directions is both impressive for its regularity and, especially among rodents, troublesome for its exceptions. However, previous studies have largely ignored mainland body size variation, treating size differences of any magnitude as equally noteworthy. Here, we use distributions of mainland population body sizes to identify island populations as 'extremely' big or small, and we compare traits of extreme populations and their islands with those of island populations more typical in body size. We find that although insular rodents vary in the directions of body size change, 'extreme' populations tend towards gigantism. With classification tree methods, we develop a predictive model, which points to resource limitations as major drivers in the few cases of insular dwarfism. Highly successful in classifying our dataset, our model also successfully predicts change in untested cases.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In 1986, New Zealand responded to the open-access problem by establishing the world's largest individual transferable quota (ITQ) system. Using a 15-year panel dataset from New Zealand that covers 33 species and more than 150 markets for fishing quotas, we assess trends in market activity, price dispersion, and the fundamentals determining quota prices. We find that market activity is sufficiently high in the economically important markets and that price dispersion has decreased. We also find evidence of economically rational behavior through the relationship between quota lease and sale prices and fishing output and input prices, ecological variability, and market interest rates. Controlling for these factors, our results show a greater increase in quota prices for fish stocks that faced significant reductions, consistent with increased profitability due to rationalization. Overall, this suggests that these markets are operating reasonably well, implying that ITQs can be effective instruments for efficient fisheries management. © 2004 Elsevier Inc. All rights reserved.