2 resultados para Remote Data Acquisition and Storage
em Collection Of Biostatistics Research Archive
Resumo:
The stashR package (a Set of Tools for Administering SHared Repositories) for R implements a simple key-value style database where character string keys are associated with data values. The key-value databases can be either stored locally on the user's computer or accessed remotely via the Internet. Methods specific to the stashR package allow users to share data repositories or access previously created remote data repositories. In particular, methods are available for the S4 classes localDB and remoteDB to insert, retrieve, or delete data from the database as well as to synchronize local copies of the data to the remote version of the database. Users efficiently access information from a remote database by retrieving only the data files indexed by user-specified keys and caching this data in a local copy of the remote database. The local and remote counterparts of the stashR package offer the potential to enhance reproducible research by allowing users of Sweave to cache their R computations for a research paper in a localDB database. This database can then be stored on the Internet as a remoteDB database. When readers of the research paper wish to reproduce the computations involved in creating a specific figure or calculating a specific numeric value, they can access the remoteDB database and obtain the R objects involved in the computation.
Resumo:
This article gives an overview over the methods used in the low--level analysis of gene expression data generated using DNA microarrays. This type of experiment allows to determine relative levels of nucleic acid abundance in a set of tissues or cell populations for thousands of transcripts or loci simultaneously. Careful statistical design and analysis are essential to improve the efficiency and reliability of microarray experiments throughout the data acquisition and analysis process. This includes the design of probes, the experimental design, the image analysis of microarray scanned images, the normalization of fluorescence intensities, the assessment of the quality of microarray data and incorporation of quality information in subsequent analyses, the combination of information across arrays and across sets of experiments, the discovery and recognition of patterns in expression at the single gene and multiple gene levels, and the assessment of significance of these findings, considering the fact that there is a lot of noise and thus random features in the data. For all of these components, access to a flexible and efficient statistical computing environment is an essential aspect.