7 resultados para collection count

em Duke University


Relevância:

30.00% 30.00%

Publicador:

Resumo:

A class of multi-process models is developed for collections of time indexed count data. Autocorrelation in counts is achieved with dynamic models for the natural parameter of the binomial distribution. In addition to modeling binomial time series, the framework includes dynamic models for multinomial and Poisson time series. Markov chain Monte Carlo (MCMC) and Po ́lya-Gamma data augmentation (Polson et al., 2013) are critical for fitting multi-process models of counts. To facilitate computation when the counts are high, a Gaussian approximation to the P ́olya- Gamma random variable is developed.

Three applied analyses are presented to explore the utility and versatility of the framework. The first analysis develops a model for complex dynamic behavior of themes in collections of text documents. Documents are modeled as a “bag of words”, and the multinomial distribution is used to characterize uncertainty in the vocabulary terms appearing in each document. State-space models for the natural parameters of the multinomial distribution induce autocorrelation in themes and their proportional representation in the corpus over time.

The second analysis develops a dynamic mixed membership model for Poisson counts. The model is applied to a collection of time series which record neuron level firing patterns in rhesus monkeys. The monkey is exposed to two sounds simultaneously, and Gaussian processes are used to smoothly model the time-varying rate at which the neuron’s firing pattern fluctuates between features associated with each sound in isolation.

The third analysis presents a switching dynamic generalized linear model for the time-varying home run totals of professional baseball players. The model endows each player with an age specific latent natural ability class and a performance enhancing drug (PED) use indicator. As players age, they randomly transition through a sequence of ability classes in a manner consistent with traditional aging patterns. When the performance of the player significantly deviates from the expected aging pattern, he is identified as a player whose performance is consistent with PED use.

All three models provide a mechanism for sharing information across related series locally in time. The models are fit with variations on the P ́olya-Gamma Gibbs sampler, MCMC convergence diagnostics are developed, and reproducible inference is emphasized throughout the dissertation.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

High-efficiency collection of photons emitted by a point source over a wide field of view (FoV) is crucial for many applications. Multiscale optics offer improved light collection by utilizing small optical components placed close to the optical source, while maintaining a wide FoV provided by conventional imaging optics. In this work, we demonstrate collection efficiency of 26% of photons emitted by a pointlike source using a micromirror fabricated in silicon with no significant decrease in collection efficiency over a 10 mm object space.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The Census of Marine Life aids practical work of the Convention on Biological Diversity, discovers and tracks ocean biodiversity, and supports marine environmental planning.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

BACKGROUND: Historically, only partial assessments of data quality have been performed in clinical trials, for which the most common method of measuring database error rates has been to compare the case report form (CRF) to database entries and count discrepancies. Importantly, errors arising from medical record abstraction and transcription are rarely evaluated as part of such quality assessments. Electronic Data Capture (EDC) technology has had a further impact, as paper CRFs typically leveraged for quality measurement are not used in EDC processes. METHODS AND PRINCIPAL FINDINGS: The National Institute on Drug Abuse Treatment Clinical Trials Network has developed, implemented, and evaluated methodology for holistically assessing data quality on EDC trials. We characterize the average source-to-database error rate (14.3 errors per 10,000 fields) for the first year of use of the new evaluation method. This error rate was significantly lower than the average of published error rates for source-to-database audits, and was similar to CRF-to-database error rates reported in the published literature. We attribute this largely to an absence of medical record abstraction on the trials we examined, and to an outpatient setting characterized by less acute patient conditions. CONCLUSIONS: Historically, medical record abstraction is the most significant source of error by an order of magnitude, and should be measured and managed during the course of clinical trials. Source-to-database error rates are highly dependent on the amount of structured data collection in the clinical setting and on the complexity of the medical record, dependencies that should be considered when developing data quality benchmarks.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Traces the history of Duke's East Asian Studies program and associated library collections from the beginning of the twentieth century to the present. Describes the strengths of the Japanese, Chinese and Korean collections, materials in special collections and cooperation with the University of North Carolina.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We recently developed an approach for testing the accuracy of network inference algorithms by applying them to biologically realistic simulations with known network topology. Here, we seek to determine the degree to which the network topology and data sampling regime influence the ability of our Bayesian network inference algorithm, NETWORKINFERENCE, to recover gene regulatory networks. NETWORKINFERENCE performed well at recovering feedback loops and multiple targets of a regulator with small amounts of data, but required more data to recover multiple regulators of a gene. When collecting the same number of data samples at different intervals from the system, the best recovery was produced by sampling intervals long enough such that sampling covered propagation of regulation through the network but not so long such that intervals missed internal dynamics. These results further elucidate the possibilities and limitations of network inference based on biological data.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This is the second installment of a three-part project to publish a group of ten Ptolemaic papyri purchased by Yale’s Beinecke Library in 1998 (acquisition “1998b”), which came to the Beinecke as three hard wads that were apparently the stuffing from the stomach cavity of a mummified animal. This article publishes: (1) P.CtYBR inv. 5019, a fragment of line ends in iambic tetrameter catalectic meter from an unknown comedy; the format suggests that this is a further example of certain type of Ptolemaic writing exercise. (2) P.CtYBR inv. 5043, a fragmentary grammatical text of uncertain import.