8 resultados para Latent class analysis

em CaltechTHESIS


Relevância:

40.00% 40.00%

Publicador:

Resumo:

The brain is perhaps the most complex system to have ever been subjected to rigorous scientific investigation. The scale is staggering: over 10^11 neurons, each making an average of 10^3 synapses, with computation occurring on scales ranging from a single dendritic spine, to an entire cortical area. Slowly, we are beginning to acquire experimental tools that can gather the massive amounts of data needed to characterize this system. However, to understand and interpret these data will also require substantial strides in inferential and statistical techniques. This dissertation attempts to meet this need, extending and applying the modern tools of latent variable modeling to problems in neural data analysis.

It is divided into two parts. The first begins with an exposition of the general techniques of latent variable modeling. A new, extremely general, optimization algorithm is proposed - called Relaxation Expectation Maximization (REM) - that may be used to learn the optimal parameter values of arbitrary latent variable models. This algorithm appears to alleviate the common problem of convergence to local, sub-optimal, likelihood maxima. REM leads to a natural framework for model size selection; in combination with standard model selection techniques the quality of fits may be further improved, while the appropriate model size is automatically and efficiently determined. Next, a new latent variable model, the mixture of sparse hidden Markov models, is introduced, and approximate inference and learning algorithms are derived for it. This model is applied in the second part of the thesis.

The second part brings the technology of part I to bear on two important problems in experimental neuroscience. The first is known as spike sorting; this is the problem of separating the spikes from different neurons embedded within an extracellular recording. The dissertation offers the first thorough statistical analysis of this problem, which then yields the first powerful probabilistic solution. The second problem addressed is that of characterizing the distribution of spike trains recorded from the same neuron under identical experimental conditions. A latent variable model is proposed. Inference and learning in this model leads to new principled algorithms for smoothing and clustering of spike data.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Vortex rings constitute the main structure in the wakes of a wide class of swimming and flying animals, as well as in cardiac flows and in the jets generated by some moss and fungi. However, there is a physical limit, determined by an energy maximization principle called the Kelvin-Benjamin principle, to the size that axisymmetric vortex rings can achieve. The existence of this limit is known to lead to the separation of a growing vortex ring from the shear layer feeding it, a process known as `vortex pinch-off', and characterized by the dimensionless vortex formation number. The goal of this thesis is to improve our understanding of vortex pinch-off as it relates to biological propulsion, and to provide future researchers with tools to assist in identifying and predicting pinch-off in biological flows.

To this end, we introduce a method for identifying pinch-off in starting jets using the Lagrangian coherent structures in the flow, and apply this criterion to an experimentally generated starting jet. Since most naturally occurring vortex rings are not circular, we extend the definition of the vortex formation number to include non-axisymmetric vortex rings, and find that the formation number for moderately non-axisymmetric vortices is similar to that of circular vortex rings. This suggests that naturally occurring vortex rings may be modeled as axisymmetric vortex rings. Therefore, we consider the perturbation response of the Norbury family of axisymmetric vortex rings. This family is chosen to model vortex rings of increasing thickness and circulation, and their response to prolate shape perturbations is simulated using contour dynamics. Finally, the response of more realistic models for vortex rings, constructed from experimental data using nested contours, to perturbations which resemble those encountered by forming vortices more closely, is simulated using contour dynamics. In both families of models, a change in response analogous to pinch-off is found as members of the family with progressively thicker cores are considered. We posit that this analogy may be exploited to understand and predict pinch-off in complex biological flows, where current methods are not applicable in practice, and criteria based on the properties of vortex rings alone are necessary.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Interleukin-2 (IL-2) is an important mediator in the vertebrate immune system. IL-2 is a potent growth factor that mature T lymphocytes use as a proliferation signal and the production of IL-2 is crucial for the clonal expansion of antigen-specific T cells in the primary immune response. IL-2 driven proliferation is dependent on the interaction of the lymphokine with its cognate multichain receptor. IL-2 expression is induced only upon stimulation and transcriptional activation of the IL-2 gene relies extensively on the coordinate interaction of numerous inducible and constitutive trans-acting factors. Over the past several years, thousands of papers have been published regarding molecular and cellular aspects of IL-2 gene expression and IL-2 function. The vast majority of these reports describe work that has been carried out in vitro. However, considerably less is known about control of IL-2 gene expression and IL-2 function in vivo.

To gain new insight into the regulation of IL-2 gene expression in vivo, anatomical and developmental patterns of IL-2 gene expression in the mouse were established by employing in situ hybridization and immunohistochemical staining methodologies to tissue sections generated from normal mice and mutant animals in which T -cell development was perturbed. Results from these studies revealed several interesting aspects of IL-2 gene expression, such as (1) induction of IL-2 gene expression and protein synthesis in the thymus, the primary site of T-cell development in the body, (2) cell-type specificity of IL-2 gene expression in vivo, (3) participation of IL-2 in the extrathymic expansion of mature T cells in particular tissues, independent of an acute immune response to foreign antigen, (4) involvement of IL-2 in maintaining immunologic balance in the mucosal immune system, and (5) potential function of IL-2 in early events associated with hematopoiesis.

Extensive analysis of IL-2 mRNA accumulation and protein production in the murine thymus at various stages of development established the existence of two classes of intrathymic IL-2 producing cells. One class of intrathymic IL-2 producers was found exclusively in the fetal thymus. Cells belonging to this subset were restricted to the outermost region of the thymus. IL-2 expression in the fetal thymus was highly transient; a dramatic peak ofiL-2 mRNA accumulation was identified at day 14.5 of gestation and maximal IL-2 protein production was observed 12 hours later, after which both IL-2 mRNA and protein levels rapidly decreased. Significantly, the presence of IL-2 expressing cells in the day 14-15 fetal thymus was not contingent on the generation of T-cell receptor (TcR) positive cells. The second class of IL-2 producing cells was also detectable in the fetal thymus (cells found in this class represented a minority subset of IL-2 producers in the fetal thymus) but persist in the thymus during later stages of development and after birth. Intrathymic IL-2 producers in postnatal animals were located in the subcapsular region and cortex, indicating that these cells reside in the same areas where immature T cells are consigned. The frequency of IL-2 expressing cells in the postnatal thymus was extremely low, indicating that induction of IL-2 expression and protein synthesis are indicative of a rare activation event. Unlike the fetal class of intrathymic IL-2 producers, the presence of IL-2 producing cells in the postnatal thymus was dependent on to the generation of TcR+ cells. Subsequent examination of intrathymic IL-2 production in mutant postnatal mice unable to produce either αβ or γδ T cells showed that postnatal IL-2 producers in the thymus belong to both αβ and γδ lineages. Additionally, further studies indicated that IL-2 synthesis by immature αβ -T cells depends on the expression of bonafide TcR αβ-heterodimers. Taken altogether, IL-2 production in the postnatal thymus relies on the generation of αβ or γδ-TcR^+ cells and induction of IL-2 protein synthesis can be linked to an activation event mediated via the TcR.

With regard to tissue specificity of IL-2 gene expression in vivo, analysis of whole body sections obtained from normal neonatal mouse pups by in situ hybridization demonstrated that IL-2 mRNA^+ cells were found in both lymphoid and nonlymphoid tissues with which T cells are associated, such as the thymus (as described above), dermis and gut. Tissues devoid of IL-2 mRNA^+ cells included brain, heart, lung, liver, stomach, spine, spinal cord, kidney, and bladder. Additional analysis of isolated tissues taken from older animals revealed that IL-2 expression was undetectable in bone marrow and in nonactivated spleen and lymph nodes. Thus, it appears that extrathymic IL-2 expressing cells in nonimmunologically challenged animals are relegated to particular epidermal and epithelial tissues in which characterized subsets of T cells reside and thatinduction of IL-2 gene expression associated with these tissues may be a result of T-cell activation therein.

Based on the neonatal in situ hybridization results, a detailed investigation into possible induction of IL-2 expression resulting in IL-2 protein synthesis in the skin and gut revealed that IL-2 expression is induced in the epidermis and intestine and IL-2 protein is available to drive cell proliferation of resident cells and/or participate in immune function in these tissues. Pertaining to IL-2 expression in the skin, maximal IL-2 mRNA accumulation and protein production were observed when resident Vγ_3^+ T-cell populations were expanding. At this age, both IL-2 mRNA^+ cells and IL-2 protein production were intimately associated with hair follicles. Likewise, at this age a significant number of CD3ε^+ cells were also found in association with follicles. The colocalization of IL-2 expression and CD3ε^+ cells suggests that IL-2 expression is induced when T cells are in contact with hair follicles. In contrast, neither IL-2 mRNA nor IL-2 protein were readily detected once T-cell density in the skin reached steady-state proportions. At this point, T cells were no longer found associated with hair follicles but were evenly distributed throughout the epidermis. In addition, IL-2 expression in the skin was contingent upon the presence of mature T cells therein and induction of IL-2 protein synthesis in the skin did not depend on the expression of a specific TcR on resident T cells. These newly disclosed properties of IL-2 expression in the skin indicate that IL-2 may play an additional role in controlling mature T-cell proliferation by participating in the extrathymic expansion of T cells, particularly those associated with the epidermis.

Finally, regarding IL-2 expression and protein synthesis in the gut, IL-2 producing cells were found associated with the lamina propria of neonatal animals and gut-associated IL-2 production persisted throughout life. In older animals, the frequency of IL-2 producing cells in the small intestine was not identical to that in the large intestine and this difference may reflect regional specialization of the mucosal immune system in response to enteric antigen. Similar to other instances of IL-2 gene expression in vivo, a failure to generate mature T cells also led to an abrogation of IL-2 protein production in the gut. The presence of IL-2 producing cells in the neonatal gut suggested that these cells may be generated during fetal development. Examination of the fetal gut to determine the distribution of IL-2 producing cells therein indicated that there was a tenfold increase in the number of gut-associated IL-2 producers at day 20 of gestation compared to that observed four days earlier and there was little difference between the frequency of IL-2 producing cells in prenatal versus neonatal gut. The origin of these fetally-derived IL-2 producing cells is unclear. Prior to the immigration of IL-2 inducible cells to the fetal gut and/or induction of IL-2 expression therein, IL-2 protein was observed in the fetal liver and fetal omentum, as well as the fetal thymus. Considering that induction of IL-2 protein synthesis may be an indication of future functional capability, detection of IL-2 producing cells in the fetal liver and fetal omentum raises the possibility that IL-2 producing cells in the fetal gut may be extrathymic in origin and IL-2 producing cells in these fetal tissues may not belong solely to the T lineage. Overall, these results provide increased understanding of the nature of IL-2 producing cells in the gut and how the absence of IL-2 production therein and in fetal hematopoietic tissues can result in the acute pathology observed in IL-2 deficient animals.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Signal processing techniques play important roles in the design of digital communication systems. These include information manipulation, transmitter signal processing, channel estimation, channel equalization and receiver signal processing. By interacting with communication theory and system implementing technologies, signal processing specialists develop efficient schemes for various communication problems by wisely exploiting various mathematical tools such as analysis, probability theory, matrix theory, optimization theory, and many others. In recent years, researchers realized that multiple-input multiple-output (MIMO) channel models are applicable to a wide range of different physical communications channels. Using the elegant matrix-vector notations, many MIMO transceiver (including the precoder and equalizer) design problems can be solved by matrix and optimization theory. Furthermore, the researchers showed that the majorization theory and matrix decompositions, such as singular value decomposition (SVD), geometric mean decomposition (GMD) and generalized triangular decomposition (GTD), provide unified frameworks for solving many of the point-to-point MIMO transceiver design problems.

In this thesis, we consider the transceiver design problems for linear time invariant (LTI) flat MIMO channels, linear time-varying narrowband MIMO channels, flat MIMO broadcast channels, and doubly selective scalar channels. Additionally, the channel estimation problem is also considered. The main contributions of this dissertation are the development of new matrix decompositions, and the uses of the matrix decompositions and majorization theory toward the practical transmit-receive scheme designs for transceiver optimization problems. Elegant solutions are obtained, novel transceiver structures are developed, ingenious algorithms are proposed, and performance analyses are derived.

The first part of the thesis focuses on transceiver design with LTI flat MIMO channels. We propose a novel matrix decomposition which decomposes a complex matrix as a product of several sets of semi-unitary matrices and upper triangular matrices in an iterative manner. The complexity of the new decomposition, generalized geometric mean decomposition (GGMD), is always less than or equal to that of geometric mean decomposition (GMD). The optimal GGMD parameters which yield the minimal complexity are derived. Based on the channel state information (CSI) at both the transmitter (CSIT) and receiver (CSIR), GGMD is used to design a butterfly structured decision feedback equalizer (DFE) MIMO transceiver which achieves the minimum average mean square error (MSE) under the total transmit power constraint. A novel iterative receiving detection algorithm for the specific receiver is also proposed. For the application to cyclic prefix (CP) systems in which the SVD of the equivalent channel matrix can be easily computed, the proposed GGMD transceiver has K/log_2(K) times complexity advantage over the GMD transceiver, where K is the number of data symbols per data block and is a power of 2. The performance analysis shows that the GGMD DFE transceiver can convert a MIMO channel into a set of parallel subchannels with the same bias and signal to interference plus noise ratios (SINRs). Hence, the average bit rate error (BER) is automatically minimized without the need for bit allocation. Moreover, the proposed transceiver can achieve the channel capacity simply by applying independent scalar Gaussian codes of the same rate at subchannels.

In the second part of the thesis, we focus on MIMO transceiver design for slowly time-varying MIMO channels with zero-forcing or MMSE criterion. Even though the GGMD/GMD DFE transceivers work for slowly time-varying MIMO channels by exploiting the instantaneous CSI at both ends, their performance is by no means optimal since the temporal diversity of the time-varying channels is not exploited. Based on the GTD, we develop space-time GTD (ST-GTD) for the decomposition of linear time-varying flat MIMO channels. Under the assumption that CSIT, CSIR and channel prediction are available, by using the proposed ST-GTD, we develop space-time geometric mean decomposition (ST-GMD) DFE transceivers under the zero-forcing or MMSE criterion. Under perfect channel prediction, the new system minimizes both the average MSE at the detector in each space-time (ST) block (which consists of several coherence blocks), and the average per ST-block BER in the moderate high SNR region. Moreover, the ST-GMD DFE transceiver designed under an MMSE criterion maximizes Gaussian mutual information over the equivalent channel seen by each ST-block. In general, the newly proposed transceivers perform better than the GGMD-based systems since the super-imposed temporal precoder is able to exploit the temporal diversity of time-varying channels. For practical applications, a novel ST-GTD based system which does not require channel prediction but shares the same asymptotic BER performance with the ST-GMD DFE transceiver is also proposed.

The third part of the thesis considers two quality of service (QoS) transceiver design problems for flat MIMO broadcast channels. The first one is the power minimization problem (min-power) with a total bitrate constraint and per-stream BER constraints. The second problem is the rate maximization problem (max-rate) with a total transmit power constraint and per-stream BER constraints. Exploiting a particular class of joint triangularization (JT), we are able to jointly optimize the bit allocation and the broadcast DFE transceiver for the min-power and max-rate problems. The resulting optimal designs are called the minimum power JT broadcast DFE transceiver (MPJT) and maximum rate JT broadcast DFE transceiver (MRJT), respectively. In addition to the optimal designs, two suboptimal designs based on QR decomposition are proposed. They are realizable for arbitrary number of users.

Finally, we investigate the design of a discrete Fourier transform (DFT) modulated filterbank transceiver (DFT-FBT) with LTV scalar channels. For both cases with known LTV channels and unknown wide sense stationary uncorrelated scattering (WSSUS) statistical channels, we show how to optimize the transmitting and receiving prototypes of a DFT-FBT such that the SINR at the receiver is maximized. Also, a novel pilot-aided subspace channel estimation algorithm is proposed for the orthogonal frequency division multiplexing (OFDM) systems with quasi-stationary multi-path Rayleigh fading channels. Using the concept of a difference co-array, the new technique can construct M^2 co-pilots from M physical pilot tones with alternating pilot placement. Subspace methods, such as MUSIC and ESPRIT, can be used to estimate the multipath delays and the number of identifiable paths is up to O(M^2), theoretically. With the delay information, a MMSE estimator for frequency response is derived. It is shown through simulations that the proposed method outperforms the conventional subspace channel estimator when the number of multipaths is greater than or equal to the number of physical pilots minus one.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The connections between convexity and submodularity are explored, for purposes of minimizing and learning submodular set functions.

First, we develop a novel method for minimizing a particular class of submodular functions, which can be expressed as a sum of concave functions composed with modular functions. The basic algorithm uses an accelerated first order method applied to a smoothed version of its convex extension. The smoothing algorithm is particularly novel as it allows us to treat general concave potentials without needing to construct a piecewise linear approximation as with graph-based techniques.

Second, we derive the general conditions under which it is possible to find a minimizer of a submodular function via a convex problem. This provides a framework for developing submodular minimization algorithms. The framework is then used to develop several algorithms that can be run in a distributed fashion. This is particularly useful for applications where the submodular objective function consists of a sum of many terms, each term dependent on a small part of a large data set.

Lastly, we approach the problem of learning set functions from an unorthodox perspective---sparse reconstruction. We demonstrate an explicit connection between the problem of learning set functions from random evaluations and that of sparse signals. Based on the observation that the Fourier transform for set functions satisfies exactly the conditions needed for sparse reconstruction algorithms to work, we examine some different function classes under which uniform reconstruction is possible.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

For damaging response, the force-displacement relationship of a structure is highly nonlinear and history-dependent. For satisfactory analysis of such behavior, it is important to be able to characterize and to model the phenomenon of hysteresis accurately. A number of models have been proposed for response studies of hysteretic structures, some of which are examined in detail in this thesis. There are two popular classes of models used in the analysis of curvilinear hysteretic systems. The first is of the distributed element or assemblage type, which models the physical behavior of the system by using well-known building blocks. The second class of models is of the differential equation type, which is based on the introduction of an extra variable to describe the history dependence of the system.

Owing to their mathematical simplicity, the latter models have been used extensively for various applications in structural dynamics, most notably in the estimation of the response statistics of hysteretic systems subjected to stochastic excitation. But the fundamental characteristics of these models are still not clearly understood. A response analysis of systems using both the Distributed Element model and the differential equation model when subjected to a variety of quasi-static and dynamic loading conditions leads to the following conclusion: Caution must be exercised when employing the models belonging to the second class in structural response studies as they can produce misleading results.

The Massing's hypothesis, originally proposed for steady-state loading, can be extended to general transient loading as well, leading to considerable simplification in the analysis of the Distributed Element models. A simple, nonparametric identification technique is also outlined, by means of which an optimal model representation involving one additional state variable is determined for hysteretic systems.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The epidemic of HIV/AIDS in the United States is constantly changing and evolving, starting from patient zero to now an estimated 650,000 to 900,000 Americans infected. The nature and course of HIV changed dramatically with the introduction of antiretrovirals. This discourse examines many different facets of HIV from the beginning where there wasn't any treatment for HIV until the present era of highly active antiretroviral therapy (HAART). By utilizing statistical analysis of clinical data, this paper examines where we were, where we are and projections as to where treatment of HIV/AIDS is headed.

Chapter Two describes the datasets that were used for the analyses. The primary database utilized was collected by myself from an outpatient HIV clinic. The data included dates from 1984 until the present. The second database was from the Multicenter AIDS Cohort Study (MACS) public dataset. The data from the MACS cover the time between 1984 and October 1992. Comparisons are made between both datasets.

Chapter Three discusses where we were. Before the first anti-HIV drugs (called antiretrovirals) were approved, there was no treatment to slow the progression of HIV. The first generation of antiretrovirals, reverse transcriptase inhibitors such as AZT (zidovudine), DDI (didanosine), DDC (zalcitabine), and D4T (stavudine) provided the first treatment for HIV. The first clinical trials showed that these antiretrovirals had a significant impact on increasing patient survival. The trials also showed that patients on these drugs had increased CD4+ T cell counts. Chapter Three examines the distributions of CD4 T cell counts. The results show that the estimated distributions of CD4 T cell counts are distinctly non-Gaussian. Thus distributional assumptions regarding CD4 T cell counts must be taken, into account when performing analyses with this marker. The results also show the estimated CD4 T cell distributions for each disease stage: asymptomatic, symptomatic and AIDS are non-Gaussian. Interestingly, the distribution of CD4 T cell counts for the asymptomatic period is significantly below that of the CD4 T cell distribution for the uninfected population suggesting that even in patients with no outward symptoms of HIV infection, there exists high levels of immunosuppression.

Chapter Four discusses where we are at present. HIV quickly grew resistant to reverse transcriptase inhibitors which were given sequentially as mono or dual therapy. As resistance grew, the positive effects of the reverse transcriptase inhibitors on CD4 T cell counts and survival dissipated. As the old era faded a new era characterized by a new class of drugs and new technology changed the way that we treat HIV-infected patients. Viral load assays were able to quantify the levels of HIV RNA in the blood. By quantifying the viral load, one now had a faster, more direct way to test antiretroviral regimen efficacy. Protease inhibitors, which attacked a different region of HIV than reverse transcriptase inhibitors, when used in combination with other antiretroviral agents were found to dramatically and significantly reduce the HIV RNA levels in the blood. Patients also experienced significant increases in CD4 T cell counts. For the first time in the epidemic, there was hope. It was hypothesized that with HAART, viral levels could be kept so low that the immune system as measured by CD4 T cell counts would be able to recover. If these viral levels could be kept low enough, it would be possible for the immune system to eradicate the virus. The hypothesis of immune reconstitution, that is bringing CD4 T cell counts up to levels seen in uninfected patients, is tested in Chapter Four. It was found that for these patients, there was not enough of a CD4 T cell increase to be consistent with the hypothesis of immune reconstitution.

In Chapter Five, the effectiveness of long-term HAART is analyzed. Survival analysis was conducted on 213 patients on long-term HAART. The primary endpoint was presence of an AIDS defining illness. A high level of clinical failure, or progression to an endpoint, was found.

Chapter Six yields insights into where we are going. New technology such as viral genotypic testing, that looks at the genetic structure of HIV and determines where mutations have occurred, has shown that HIV is capable of producing resistance mutations that confer multiple drug resistance. This section looks at resistance issues and speculates, ceterus parabis, where the state of HIV is going. This section first addresses viral genotype and the correlates of viral load and disease progression. A second analysis looks at patients who have failed their primary attempts at HAART and subsequent salvage therapy. It was found that salvage regimens, efforts to control viral replication through the administration of different combinations of antiretrovirals, were not effective in 90 percent of the population in controlling viral replication. Thus, primary attempts at therapy offer the best change of viral suppression and delay of disease progression. Documentation of transmission of drug-resistant virus suggests that the public health crisis of HIV is far from over. Drug resistant HIV can sustain the epidemic and hamper our efforts to treat HIV infection. The data presented suggest that the decrease in the morbidity and mortality due to HIV/AIDS is transient. Deaths due to HIV will increase and public health officials must prepare for this eventuality unless new treatments become available. These results also underscore the importance of the vaccine effort.

The final chapter looks at the economic issues related to HIV. The direct and indirect costs of treating HIV/AIDS are very high. For the first time in the epidemic, there exists treatment that can actually slow disease progression. The direct costs for HAART are estimated. It is estimated that the direct lifetime costs for treating each HIV infected patient with HAART is between $353,000 to $598,000 depending on how long HAART prolongs life. If one looks at the incremental cost per year of life saved it is only $101,000. This is comparable with the incremental costs per year of life saved from coronary artery bypass surgery.

Policy makers need to be aware that although HAART can delay disease progression, it is not a cure and HIV is not over. The results presented here suggest that the decreases in the morbidity and mortality due to HIV are transient. Policymakers need to be prepared for the eventual increase in AIDS incidence and mortality. Costs associated with HIV/AIDS are also projected to increase. The cost savings seen recently have been from the dramatic decreases in the incidence of AIDS defining opportunistic infections. As patients who have been on HAART the longest start to progress to AIDS, policymakers and insurance companies will find that the cost of treating HIV/AIDS will increase.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This thesis is an investigation into the nature of data analysis and computer software systems which support this activity.

The first chapter develops the notion of data analysis as an experimental science which has two major components: data-gathering and theory-building. The basic role of language in determining the meaningfulness of theory is stressed, and the informativeness of a language and data base pair is studied. The static and dynamic aspects of data analysis are then considered from this conceptual vantage point. The second chapter surveys the available types of computer systems which may be useful for data analysis. Particular attention is paid to the questions raised in the first chapter about the language restrictions imposed by the computer system and its dynamic properties.

The third chapter discusses the REL data analysis system, which was designed to satisfy the needs of the data analyzer in an operational relational data system. The major limitation on the use of such systems is the amount of access to data stored on a relatively slow secondary memory. This problem of the paging of data is investigated and two classes of data structure representations are found, each of which has desirable paging characteristics for certain types of queries. One representation is used by most of the generalized data base management systems in existence today, but the other is clearly preferred in the data analysis environment, as conceptualized in Chapter I.

This data representation has strong implications for a fundamental process of data analysis -- the quantification of variables. Since quantification is one of the few means of summarizing and abstracting, data analysis systems are under strong pressure to facilitate the process. Two implementations of quantification are studied: one analagous to the form of the lower predicate calculus and another more closely attuned to the data representation. A comparison of these indicates that the use of the "label class" method results in orders of magnitude improvement over the lower predicate calculus technique.