973 resultados para PARTON DISTRIBUTIONS
Resumo:
In the first part of the thesis we explore three fundamental questions that arise naturally when we conceive a machine learning scenario where the training and test distributions can differ. Contrary to conventional wisdom, we show that in fact mismatched training and test distribution can yield better out-of-sample performance. This optimal performance can be obtained by training with the dual distribution. This optimal training distribution depends on the test distribution set by the problem, but not on the target function that we want to learn. We show how to obtain this distribution in both discrete and continuous input spaces, as well as how to approximate it in a practical scenario. Benefits of using this distribution are exemplified in both synthetic and real data sets.
In order to apply the dual distribution in the supervised learning scenario where the training data set is fixed, it is necessary to use weights to make the sample appear as if it came from the dual distribution. We explore the negative effect that weighting a sample can have. The theoretical decomposition of the use of weights regarding its effect on the out-of-sample error is easy to understand but not actionable in practice, as the quantities involved cannot be computed. Hence, we propose the Targeted Weighting algorithm that determines if, for a given set of weights, the out-of-sample performance will improve or not in a practical setting. This is necessary as the setting assumes there are no labeled points distributed according to the test distribution, only unlabeled samples.
Finally, we propose a new class of matching algorithms that can be used to match the training set to a desired distribution, such as the dual distribution (or the test distribution). These algorithms can be applied to very large datasets, and we show how they lead to improved performance in a large real dataset such as the Netflix dataset. Their computational complexity is the main reason for their advantage over previous algorithms proposed in the covariate shift literature.
In the second part of the thesis we apply Machine Learning to the problem of behavior recognition. We develop a specific behavior classifier to study fly aggression, and we develop a system that allows analyzing behavior in videos of animals, with minimal supervision. The system, which we call CUBA (Caltech Unsupervised Behavior Analysis), allows detecting movemes, actions, and stories from time series describing the position of animals in videos. The method summarizes the data, as well as it provides biologists with a mathematical tool to test new hypotheses. Other benefits of CUBA include finding classifiers for specific behaviors without the need for annotation, as well as providing means to discriminate groups of animals, for example, according to their genetic line.
Resumo:
Let {Ƶn}∞n = -∞ be a stochastic process with state space S1 = {0, 1, …, D – 1}. Such a process is called a chain of infinite order. The transitions of the chain are described by the functions
Qi(i(0)) = Ƥ(Ƶn = i | Ƶn - 1 = i (0)1, Ƶn - 2 = i (0)2, …) (i ɛ S1), where i(0) = (i(0)1, i(0)2, …) ranges over infinite sequences from S1. If i(n) = (i(n)1, i(n)2, …) for n = 1, 2,…, then i(n) → i(0) means that for each k, i(n)k = i(0)k for all n sufficiently large.
Given functions Qi(i(0)) such that
(i) 0 ≤ Qi(i(0) ≤ ξ ˂ 1
(ii)D – 1/Ʃ/i = 0 Qi(i(0)) Ξ 1
(iii) Qi(i(n)) → Qi(i(0)) whenever i(n) → i(0),
we prove the existence of a stationary chain of infinite order {Ƶn} whose transitions are given by
Ƥ (Ƶn = i | Ƶn - 1, Ƶn - 2, …) = Qi(Ƶn - 1, Ƶn - 2, …)
With probability 1. The method also yields stationary chains {Ƶn} for which (iii) does not hold but whose transition probabilities are, in a sense, “locally Markovian.” These and similar results extend a paper by T.E. Harris [Pac. J. Math., 5 (1955), 707-724].
Included is a new proof of the existence and uniqueness of a stationary absolute distribution for an Nth order Markov chain in which all transitions are possible. This proof allows us to achieve our main results without the use of limit theorem techniques.
Resumo:
Part I
Present experimental data on nucleon-antinucleon scattering allow a study of the possibility of a phase transition in a nucleon-antinucleon gas at high temperature. Estimates can be made of the general behavior of the elastic phase shifts without resorting to theoretical derivation. A phase transition which separates nucleons from antinucleons is found at about 280 MeV in the approximation of the second virial coefficient to the free energy of the gas.
Part II
The parton model is used to derive scaling laws for the hadrons observed in deep inelastic electron-nucleon scattering which lie in the fragmentation region of the virtual photon. Scaling relations are obtained in the Bjorken and Regge regions. It is proposed that the distribution functions become independent of both q2 and ν where the Bjorken and Regge regions overlap. The quark density functions are discussed in the limit x→1 for the nucleon octet and the pseudoscalar mesons. Under certain plausible assumptions it is found that only one or two quarks of the six types of quarks and antiquarks have an appreciable density function in the limit x→1. This has implications for the quark fragmentation functions near the large momentum boundary of their fragmentation region. These results are used to propose a method of measuring the proton and neutron quark density functions for all x by making measurements on inclusively produced hadrons in electroproduction only. Implications are also discussed for the hadrons produced in electron-positron annihilation.
Resumo:
By means of the Huygens-Fresnel diffraction integral, the field representation of a laser beam modulated by a hard-edged aperture is derived. The near-field and far-field transverse intensity distributions of the beams with different bandwidths are analyzed by using the representation. The numerical calculation results indicate that the amplitudes and numbers of the intensity spikes decrease with increasing bandwidth, and beam smoothing is achieved when the bandwidth takes a certain value in the near field. In the far field, the radius of the transverse intensity distribution decreases as the bandwidth increases, and the physical explanation of this fact is also given. (c) 2005 Optical Society of America.
Resumo:
Starting from the Huygens-Fresnel diffraction integral and the Fourier transform, the propagation expression of a chirped pulse passing through a hard-edged aperture is derived. Using the obtained expression, the intensity distributions of the pulse with different chirp in the near and far fields are analyzed in detail. Due to the modulation of the aperture, many intensity peaks emerge in the intensity distributions of the chirped pulse in the near field. However, the amplitudes of the intensity peaks decrease on increasing the chirp, which results in the smoothing effect in the intensity distributions. The beam smoothing brought by increasing the chirp is explained physically. Also, it is found that the radius of the intensity distribution of the chirped pulse decreases when the chirp increases in the far field. (c) 2005 Elsevier GmbH. All rights reserved.
Resumo:
We provide a novel hollow-core holey fibre that owns a random distribution of air holes in the cladding. Our experiments demonstrate that many of the features previously attributed to photonic crystal fibres with perfect arrangement of air holes, in particular, photonic bandgap guidance, can also be obtained in the fibre. Additionally, this fibre exhibits a second guided mode with both the two-lobe patterns, and each pattern is in different colour.
Resumo:
Mathematical models for heated water outfalls were developed for three flow regions. Near the source, the subsurface discharge into a stratified ambient water issuing from a row of buoyant jets was solved with the jet interference included in the analysis. The analysis of the flow zone close to and at intermediate distances from a surface buoyant jet was developed for the two-dimensional and axisymmetric cases. Far away from the source, a passive dispersion model was solved for a two dimensional situation taking into consideration the effects of shear current and vertical changes in diffusivity. A significant result from the surface buoyant jet analysis is the ability to predict the onset and location of an internal hydraulic jump. Prediction can be made simply from the knowledge of the source Froude number and a dimensionless surface exchange coefficient. Parametric computer programs of the above models are also developed as a part of this study. This report was submitted in fulfillment of Contract No. 14-12-570 under the sponsorship of the Federal Water Quality Administration.
Resumo:
The abundances and distributions of coastal pelagic fish species in the California Current Ecosystem from San Diego to southern Vancouver Island, were estimated from combined acoustic and trawl surveys conducted in the spring of 2006, 2008, and 2010. Pacific sardine (Sardinops sagax), jack mackerel (Trachurus symmetricus), and Pacific mackerel (Scomber japonicus) were the dominant coastal pelagic fish species, in that order. Northern anchovy (Engraulis mordax) and Pacific herring (Clupea pallasii) were sampled only sporadically and therefore estimates for these species were unreliable. The estimates of sardine biomass compared well with those of the annual assessments and confirmed a declining trajectory of the “northern stock” since 2006. During the sampling period, the biomass of jack mackerel was stable or increasing, and that of Pacific mackerel was low and variable. The uncertainties in these estimates are mostly the result of spatial patchiness which increased from sardine to mackerels to anchovy and herring. Future surveys of coastal pelagic fish species in the California Current Ecosystem should benefit from adaptive sampling based on modeled habitat; increased echosounder and trawl sampling, particularly for the most patchy and nearshore species; and directed-trawl sampling for improved species identification and estimations of their acoustic target stren
Resumo:
When estimating parameters that constitute a discrete probability distribution {pj}, it is difficult to determine how constraints should be made to guarantee that the estimated parameters { pˆj} constitute a probability distribution (i.e., pˆj>0, Σ pˆj =1). For age distributions estimated from mixtures of length-at-age distributions, the EM (expectationmaximization) algorithm (Hasselblad, 1966; Hoenig and Heisey, 1987; Kimura and Chikuni, 1987), restricted least squares (Clark, 1981), and weak quasisolutions (Troynikov, 2004) have all been used. Each of these methods appears to guarantee that the estimated distribution will be a true probability distribution with all categories greater than or equal to zero and with individual probabilities that sum to one. In addition, all these methods appear to provide a theoretical basis for solutions that will be either maximum-likelihood estimates or at least convergent to a probability distribut