7 resultados para Data Streams Distribution

em CaltechTHESIS


Relevância:

80.00% 80.00%

Publicador:

Resumo:

This thesis describes the design and implementation of a situation awareness application. The application gathers data from sensors including accelerometers for monitoring earthquakes, carbon monoxide sensors for monitoring fires, radiation detectors, and dust sensors. The application also gathers Internet data sources including data about traffic congestion on daily commute routes, information about hazards, news relevant to the user of the application, and weather. The application sends the data to a Cloud computing service which aggregates data streams from multiple sites and detects anomalies. Information from the Cloud service is then displayed by the application on a tablet, computer monitor, or television screen. The situation awareness application enables almost all members of a community to remain aware of critical changes in their environments.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Data were taken in 1979-80 by the CCFRR high energy neutrino experiment at Fermilab. A total of 150,000 neutrino and 23,000 antineutrino charged current events in the approximate energy range 25 < E_v < 250GeV are measured and analyzed. The structure functions F2 and xF_3 are extracted for three assumptions about σ_L/σ_T:R=0., R=0.1 and R= a QCD based expression. Systematic errors are estimated and their significance is discussed. Comparisons or the X and Q^2 behaviour or the structure functions with results from other experiments are made.

We find that statistical errors currently dominate our knowledge of the valence quark distribution, which is studied in this thesis. xF_3 from different experiments has, within errors and apart from level differences, the same dependence on x and Q^2, except for the HPWF results. The CDHS F_2 shows a clear fall-off at low-x from the CCFRR and EMC results, again apart from level differences which are calculable from cross-sections.

The result for the the GLS rule is found to be 2.83±.15±.09±.10 where the first error is statistical, the second is an overall level error and the third covers the rest of the systematic errors. QCD studies of xF_3 to leading and second order have been done. The QCD evolution of xF_3, which is independent of R and the strange sea, does not depend on the gluon distribution and fits yield

ʌ_(LO) = 88^(+163)_(-78) ^(+113)_(-70) MeV

The systematic errors are smaller than the statistical errors. Second order fits give somewhat different values of ʌ, although α_s (at Q^2_0 = 12.6 GeV^2) is not so different.

A fit using the better determined F_2 in place of xF_3 for x > 0.4 i.e., assuming q = 0 in that region, gives

ʌ_(LO) = 266^(+114)_(-104) ^(+85)_(-79) MeV

Again, the statistical errors are larger than the systematic errors. An attempt to measure R was made and the measurements are described. Utilizing the inequality q(x)≥0 we find that in the region x > .4 R is less than 0.55 at the 90% confidence level.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The brain is perhaps the most complex system to have ever been subjected to rigorous scientific investigation. The scale is staggering: over 10^11 neurons, each making an average of 10^3 synapses, with computation occurring on scales ranging from a single dendritic spine, to an entire cortical area. Slowly, we are beginning to acquire experimental tools that can gather the massive amounts of data needed to characterize this system. However, to understand and interpret these data will also require substantial strides in inferential and statistical techniques. This dissertation attempts to meet this need, extending and applying the modern tools of latent variable modeling to problems in neural data analysis.

It is divided into two parts. The first begins with an exposition of the general techniques of latent variable modeling. A new, extremely general, optimization algorithm is proposed - called Relaxation Expectation Maximization (REM) - that may be used to learn the optimal parameter values of arbitrary latent variable models. This algorithm appears to alleviate the common problem of convergence to local, sub-optimal, likelihood maxima. REM leads to a natural framework for model size selection; in combination with standard model selection techniques the quality of fits may be further improved, while the appropriate model size is automatically and efficiently determined. Next, a new latent variable model, the mixture of sparse hidden Markov models, is introduced, and approximate inference and learning algorithms are derived for it. This model is applied in the second part of the thesis.

The second part brings the technology of part I to bear on two important problems in experimental neuroscience. The first is known as spike sorting; this is the problem of separating the spikes from different neurons embedded within an extracellular recording. The dissertation offers the first thorough statistical analysis of this problem, which then yields the first powerful probabilistic solution. The second problem addressed is that of characterizing the distribution of spike trains recorded from the same neuron under identical experimental conditions. A latent variable model is proposed. Inference and learning in this model leads to new principled algorithms for smoothing and clustering of spike data.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A research program was designed (1) to map regional lithological units of the lunar surface based on measurements of spatial variations in spectral reflectance, and, (2) to establish the sequence of the formation of such lithological units from measurements of the accumulated affects of impacting bodies.

Spectral reflectance data were obtained by scanning luminance variations over the lunar surface at three wavelengths (0.4µ, 0.52µ, and 0.7µ). These luminance measurements were reduced to normalized spectral reflectance values relative to a standard area in More Serenitotis. The spectral type of each lunar area was identified from the shape of its reflectance spectrum. From these data lithological units or regions of constant color were identified. The maria fall into two major spectral classes: circular moria like More Serenitotis contain S-type or red material and thin, irregular, expansive maria like Mare Tranquillitatis contain T-type or blue material. Four distinct subtypes of S-type reflectances and two of T-type reflectances exist. As these six subtypes occur in a number of lunar regions, it is concluded that they represent specific types of material rather than some homologous set of a few end members.

The relative ages or sequence of formation of these more units were established from measurements of the accumulated impacts which have occurred since more formation. A model was developed which relates the integrated flux of particles which hove impacted a surface to the distribution of craters as functions of size and shape. Erosion of craters is caused chiefly by small bodies which produce negligible individual changes in crater shape. Hence the shape of a crater can be used to estimate the total number of small impacts that have occurred since the crater was formed. Relative ages of a surface can then be obtained from measurements of the slopes of the walls of the oldest craters formed on the surface. The results show that different maria and regions within them were emplaced at different times. An approximate absolute time scale was derived from Apollo 11 crystallization ages under an assumption of a constant rote of impacting for the last 4 x 10^9 yrs. Assuming, constant flux, the period of mare formation lasted from over 4 x 10^9 yrs to about 1.5 x 10^9 yrs ago.

A synthesis of the results of relative age measurements and of spectral reflectance mapping shows that (1) the formation of the lunar maria occurred in three stages; material of only one spectral type was deposited in each stage, (2) two distinct kinds of maria exist, each type distinguished by morphology, structure, gravity anomalies, time of formation, and spectral reflectance type, and (3) individual maria have complicated histories; they contain a variety of lithic units emplaced at different times.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In the first part of the thesis we explore three fundamental questions that arise naturally when we conceive a machine learning scenario where the training and test distributions can differ. Contrary to conventional wisdom, we show that in fact mismatched training and test distribution can yield better out-of-sample performance. This optimal performance can be obtained by training with the dual distribution. This optimal training distribution depends on the test distribution set by the problem, but not on the target function that we want to learn. We show how to obtain this distribution in both discrete and continuous input spaces, as well as how to approximate it in a practical scenario. Benefits of using this distribution are exemplified in both synthetic and real data sets.

In order to apply the dual distribution in the supervised learning scenario where the training data set is fixed, it is necessary to use weights to make the sample appear as if it came from the dual distribution. We explore the negative effect that weighting a sample can have. The theoretical decomposition of the use of weights regarding its effect on the out-of-sample error is easy to understand but not actionable in practice, as the quantities involved cannot be computed. Hence, we propose the Targeted Weighting algorithm that determines if, for a given set of weights, the out-of-sample performance will improve or not in a practical setting. This is necessary as the setting assumes there are no labeled points distributed according to the test distribution, only unlabeled samples.

Finally, we propose a new class of matching algorithms that can be used to match the training set to a desired distribution, such as the dual distribution (or the test distribution). These algorithms can be applied to very large datasets, and we show how they lead to improved performance in a large real dataset such as the Netflix dataset. Their computational complexity is the main reason for their advantage over previous algorithms proposed in the covariate shift literature.

In the second part of the thesis we apply Machine Learning to the problem of behavior recognition. We develop a specific behavior classifier to study fly aggression, and we develop a system that allows analyzing behavior in videos of animals, with minimal supervision. The system, which we call CUBA (Caltech Unsupervised Behavior Analysis), allows detecting movemes, actions, and stories from time series describing the position of animals in videos. The method summarizes the data, as well as it provides biologists with a mathematical tool to test new hypotheses. Other benefits of CUBA include finding classifiers for specific behaviors without the need for annotation, as well as providing means to discriminate groups of animals, for example, according to their genetic line.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This study concerns the longitudinal dispersion of fluid particles which are initially distributed uninformly over one cross section of a uniform, steady, turbulent open channel flow. The primary focus is on developing a method to predict the rate of dispersion in a natural stream.

Taylor's method of determining a dispersion coefficient, previously applied to flow in pipes and two-dimensional open channels, is extended to a class of three-dimensional flows which have large width-to-depth ratios, and in which the velocity varies continuously with lateral cross-sectional position. Most natural streams are included. The dispersion coefficient for a natural stream may be predicted from measurements of the channel cross-sectional geometry, the cross-sectional distribution of velocity, and the overall channel shear velocity. Tracer experiments are not required.

Large values of the dimensionless dispersion coefficient D/rU* are explained by lateral variations in downstream velocity. In effect, the characteristic length of the cross section is shown to be proportional to the width, rather than the hydraulic radius. The dimensionless dispersion coefficient depends approximately on the square of the width to depth ratio.

A numerical program is given which is capable of generating the entire dispersion pattern downstream from an instantaneous point or plane source of pollutant. The program is verified by the theory for two-dimensional flow, and gives results in good agreement with laboratory and field experiments.

Both laboratory and field experiments are described. Twenty-one laboratory experiments were conducted: thirteen in two-dimensional flows, over both smooth and roughened bottoms; and eight in three-dimensional flows, formed by adding extreme side roughness to produce lateral velocity variations. Four field experiments were conducted in the Green-Duwamish River, Washington.

Both laboratory and flume experiments prove that in three-dimensional flow the dominant mechanism for dispersion is lateral velocity variation. For instance, in one laboratory experiment the dimensionless dispersion coefficient D/rU* (where r is the hydraulic radius and U* the shear velocity) was increased by a factory of ten by roughening the channel banks. In three-dimensional laboratory flow, D/rU* varied from 190 to 640, a typical range for natural streams. For each experiment, the measured dispersion coefficient agreed with that predicted by the extension of Taylor's analysis within a maximum error of 15%. For the Green-Duwamish River, the average experimentally measured dispersion coefficient was within 5% of the prediction.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Techniques are developed for estimating activity profiles in fixed bed reactors and catalyst deactivation parameters from operating reactor data. These techniques are applicable, in general, to most industrial catalytic processes. The catalytic reforming of naphthas is taken as a broad example to illustrate the estimation schemes and to signify the physical meaning of the kinetic parameters of the estimation equations. The work is described in two parts. Part I deals with the modeling of kinetic rate expressions and the derivation of the working equations for estimation. Part II concentrates on developing various estimation techniques.

Part I: The reactions used to describe naphtha reforming are dehydrogenation and dehydroisomerization of cycloparaffins; isomerization, dehydrocyclization and hydrocracking of paraffins; and the catalyst deactivation reactions, namely coking on alumina sites and sintering of platinum crystallites. The rate expressions for the above reactions are formulated, and the effects of transport limitations on the overall reaction rates are discussed in the appendices. Moreover, various types of interaction between the metallic and acidic active centers of reforming catalysts are discussed as characterizing the different types of reforming reactions.

Part II: In catalytic reactor operation, the activity distribution along the reactor determines the kinetics of the main reaction and is needed for predicting the effect of changes in the feed state and the operating conditions on the reactor output. In the case of a monofunctional catalyst and of bifunctional catalysts in limiting conditions, the cumulative activity is sufficient for predicting steady reactor output. The estimation of this cumulative activity can be carried out easily from measurements at the reactor exit. For a general bifunctional catalytic system, the detailed activity distribution is needed for describing the reactor operation, and some approximation must be made to obtain practicable estimation schemes. This is accomplished by parametrization techniques using measurements at a few points along the reactor. Such parametrization techniques are illustrated numerically with a simplified model of naphtha reforming.

To determine long term catalyst utilization and regeneration policies, it is necessary to estimate catalyst deactivation parameters from the the current operating data. For a first order deactivation model with a monofunctional catalyst or with a bifunctional catalyst in special limiting circumstances, analytical techniques are presented to transform the partial differential equations to ordinary differential equations which admit more feasible estimation schemes. Numerical examples include the catalytic oxidation of butene to butadiene and a simplified model of naphtha reforming. For a general bifunctional system or in the case of a monofunctional catalyst subject to general power law deactivation, the estimation can only be accomplished approximately. The basic feature of an appropriate estimation scheme involves approximating the activity profile by certain polynomials and then estimating the deactivation parameters from the integrated form of the deactivation equation by regression techniques. Different bifunctional systems must be treated by different estimation algorithms, which are illustrated by several cases of naphtha reforming with different feed or catalyst composition.