9 resultados para subset sum problems

em Duke University


Relevância:

30.00% 30.00%

Publicador:

Resumo:

Fitting statistical models is computationally challenging when the sample size or the dimension of the dataset is huge. An attractive approach for down-scaling the problem size is to first partition the dataset into subsets and then fit using distributed algorithms. The dataset can be partitioned either horizontally (in the sample space) or vertically (in the feature space), and the challenge arise in defining an algorithm with low communication, theoretical guarantees and excellent practical performance in general settings. For sample space partitioning, I propose a MEdian Selection Subset AGgregation Estimator ({\em message}) algorithm for solving these issues. The algorithm applies feature selection in parallel for each subset using regularized regression or Bayesian variable selection method, calculates the `median' feature inclusion index, estimates coefficients for the selected features in parallel for each subset, and then averages these estimates. The algorithm is simple, involves very minimal communication, scales efficiently in sample size, and has theoretical guarantees. I provide extensive experiments to show excellent performance in feature selection, estimation, prediction, and computation time relative to usual competitors.

While sample space partitioning is useful in handling datasets with large sample size, feature space partitioning is more effective when the data dimension is high. Existing methods for partitioning features, however, are either vulnerable to high correlations or inefficient in reducing the model dimension. In the thesis, I propose a new embarrassingly parallel framework named {\em DECO} for distributed variable selection and parameter estimation. In {\em DECO}, variables are first partitioned and allocated to m distributed workers. The decorrelated subset data within each worker are then fitted via any algorithm designed for high-dimensional problems. We show that by incorporating the decorrelation step, DECO can achieve consistent variable selection and parameter estimation on each subset with (almost) no assumptions. In addition, the convergence rate is nearly minimax optimal for both sparse and weakly sparse models and does NOT depend on the partition number m. Extensive numerical experiments are provided to illustrate the performance of the new framework.

For datasets with both large sample sizes and high dimensionality, I propose a new "divided-and-conquer" framework {\em DEME} (DECO-message) by leveraging both the {\em DECO} and the {\em message} algorithm. The new framework first partitions the dataset in the sample space into row cubes using {\em message} and then partition the feature space of the cubes using {\em DECO}. This procedure is equivalent to partitioning the original data matrix into multiple small blocks, each with a feasible size that can be stored and fitted in a computer in parallel. The results are then synthezied via the {\em DECO} and {\em message} algorithm in a reverse order to produce the final output. The whole framework is extremely scalable.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Successfully predicting the frequency dispersion of electronic hyperpolarizabilities is an unresolved challenge in materials science and electronic structure theory. We show that the generalized Thomas-Kuhn sum rules, combined with linear absorption data and measured hyperpolarizability at one or two frequencies, may be used to predict the entire frequency-dependent electronic hyperpolarizability spectrum. This treatment includes two- and three-level contributions that arise from the lowest two or three excited electronic state manifolds, enabling us to describe the unusual observed frequency dispersion of the dynamic hyperpolarizability in high oscillator strength M-PZn chromophores, where (porphinato)zinc(II) (PZn) and metal(II)polypyridyl (M) units are connected via an ethyne unit that aligns the high oscillator strength transition dipoles of these components in a head-to-tail arrangement. We show that some of these structures can possess very similar linear absorption spectra yet manifest dramatically different frequency dependent hyperpolarizabilities, because of three-level contributions that result from excited state-to excited state transition dipoles among charge polarized states. Importantly, this approach provides a quantitative scheme to use linear optical absorption spectra and very limited individual hyperpolarizability measurements to predict the entire frequency-dependent nonlinear optical response. Copyright © 2010 American Chemical Society.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The relation between social rejection and growth in antisocial behavior was investigated. In Study 1,259 boys and girls (34% African American) were followed from Grades 1 to 3 (ages 6-8 years) to Grades 5 to 7 (ages 10-12 years). Early peer rejection predicted growth in aggression. In Study 2,585 boys and girls (16% African American) were followed from kindergarten to Grade 3 (ages 5-8 years), and findings were replicated. Furthermore, early aggression moderated the effect of rejection, such that rejection exacerbated antisocial development only among children initially disposed toward aggression. In Study 3, social information-processing patterns measured in Study 1 were found to mediate partially the effect of early rejection on later aggression. In Study 4, processing patterns measured in Study 2 replicated the mediation effect. Findings are integrated into a recursive model of antisocial development.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Externalizing behavior problems of 124 adolescents were assessed across Grades 7-11. In Grade 9, participants were also assessed across social-cognitive domains after imagining themselves as the object of provocations portrayed in six videotaped vignettes. Participants responded to vignette-based questions representing multiple processes of the response decision step of social information processing. Phase 1 of our investigation supported a two-factor model of the response evaluation process of response decision (response valuation and outcome expectancy). Phase 2 showed significant relations between the set of these response decision processes, as well as response selection, measured in Grade 9 and (a) externalizing behavior in Grade 9 and (b) externalizing behavior in Grades 10-11, even after controlling externalizing behavior in Grades 7-8. These findings suggest that on-line behavioral judgments about aggression play a crucial role in the maintenance and growth of aggressive response tendencies in adolescence.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

© 2015, Institute of Mathematical Statistics. All rights reserved.In order to use persistence diagrams as a true statistical tool, it would be very useful to have a good notion of mean and variance for a set of diagrams. In [23], Mileyko and his collaborators made the first study of the properties of the Fréchet mean in (Dp, Wp), the space of persistence diagrams equipped with the p-th Wasserstein metric. In particular, they showed that the Fréchet mean of a finite set of diagrams always exists, but is not necessarily unique. The means of a continuously-varying set of diagrams do not themselves (necessarily) vary continuously, which presents obvious problems when trying to extend the Fréchet mean definition to the realm of time-varying persistence diagrams, better known as vineyards. We fix this problem by altering the original definition of Fréchet mean so that it now becomes a probability measure on the set of persistence diagrams; in a nutshell, the mean of a set of diagrams will be a weighted sum of atomic measures, where each atom is itself a persistence diagram determined using a perturbation of the input diagrams. This definition gives for each N a map (Dp)N→ℙ(Dp). We show that this map is Hölder continuous on finite diagrams and thus can be used to build a useful statistic on vineyards.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We all experience a host of common life stressors such as the death of a family member, medical illness, and financial uncertainty. While most of us are resilient to such stressors, continuing to function normally, for a subset of individuals, experiencing these stressors increases the likelihood of developing treatment-resistant, chronic psychological problems, including depression and anxiety. It is thus paramount to identify predictive markers of risk, particularly those reflecting fundamental biological processes that can be targets for intervention and prevention. Using data from a longitudinal study of 340 healthy young adults, we demonstrate that individual differences in threat-related amygdala reactivity predict psychological vulnerability to life stress occurring as much as 1 to 4 years later. These results highlight a readily assayed biomarker, threat-related amygdala reactivity, which predicts psychological vulnerability to commonly experienced stressors and represents a discrete target for intervention and prevention.