986 resultados para sample complexity
Resumo:
Most empirical disciplines promote the reuse and sharing of datasets, as it leads to greater possibility of replication. While this is increasingly the case in Empirical Software Engineering, some of the most popular bug-fix datasets are now known to be biased. This raises two significants concerns: first, that sample bias may lead to underperforming prediction models, and second, that the external validity of the studies based on biased datasets may be suspect. This issue has raised considerable consternation in the ESE literature in recent years. However, there is a confounding factor of these datasets that has not been examined carefully: size. Biased datasets are sampling only some of the data that could be sampled, and doing so in a biased fashion; but biased samples could be smaller, or larger. Smaller data sets in general provide less reliable bases for estimating models, and thus could lead to inferior model performance. In this setting, we ask the question, what affects performance more? bias, or size? We conduct a detailed, large-scale meta-analysis, using simulated datasets sampled with bias from a high-quality dataset which is relatively free of bias. Our results suggest that size always matters just as much bias direction, and in fact much more than bias direction when considering information-retrieval measures such as AUC and F-score. This indicates that at least for prediction models, even when dealing with sampling bias, simply finding larger samples can sometimes be sufficient. Our analysis also exposes the complexity of the bias issue, and raises further issues to be explored in the future.
Resumo:
This study expanded the earlier work conducted by this laboratory ( Hasking, P.A. and Oei, T.P.S. (2002a) . The differential role of alcohol expectancies, drinking refusal self-efficacy and coping resources in predicting alcohol consumption in community and clinical samples. Addiction Research and Theory , 10 , 465-494), by examining the independent and interactive effects of avoidant coping strategies, positive and negative expectancies and self-efficacy, in predicting volume and frequency of alcohol consumption in a sample of community drinkers. Differential relationships were found between the variables when predicting the two consumption measures. Specifically, while self-efficacy, seeking social support for emotional reasons and using drugs or alcohol to cope were independently related to both volume and frequency of drinking, complex interactions with positive and negative alcohol expectancies were also found. These interactions are discussed in terms of the cognitive and behavioural mechanisms thought to underlie drinking behaviour.
Resumo:
Immigrants from the West Indies and other nations challenge the simple United States dichotomy of blacks versus whites. Many apparently black Caribbean immigrants proclaim that they did not know they were “black” until they arrived in the U.S. They seek to maintain their national identity and resist identity and solidarity with Black Americans. In response, many Black Americans respond that the immigrants are simply being naive, that U.S. society demands simple racial identity. Regardless of one's self-identity and personal history, in the U.S., if you look black, you are black, was their thinking. ^ This study examines the contemporary struggle of identity and solidarity among and between Black Americans and Jamaicans living in South Florida (Broward and Miami-Dade counties). Even though the primary focus of this study is to examine the relationship between Black Americans and Jamaicans, other West Indian nationals will be addressed more generally. The primary research problem of this study is to determine why the existence of common ancestry and physical traits are insufficient for an assumption of ethnic solidarity between Black Americans and Jamaicans. ^ In examining this problem, I felt that depth rather than breadth would provide insight into the current state of polarization between Black Americans and Jamaicans. To this end, a qualitative study was designed. A non-random snowball sample consisting of forty-seven informants was selected for this study. Realizing that such a technique presents problems with generalizations beyond the sample, this approach was, nonetheless, the most suitable for the current research problem. One of the initial challenges of this research was the use of the label “black” in discussing Caribbean immigrants. Unlike America, where distinctions based on skin color were at the bedrock of America's formation, this was not the case in the Caribbean. In the Caribbean skin color was an important marker as an indicator of class, rather than of race. Therefore, I refrained from using the label, “black Jamaicans,” but rather used Jamaicans throughout. ^
Resumo:
While most studies take a dyadic view when examining the environmental difference between the home country of a multinational enterprise (MNE) and a particular foreign country, they ignore that an MNE is managing a network of subsidiaries embedded in diverse environments. Additionally, neither the impacts of global environments on top executives nor the effects of top executives’ capabilities to handle institutional complexity are fully explored. Thus, using a three-essay format, this dissertation tried to fill these gaps by addressing the effects of institutional complexity and top management characteristics on top executive compensation and firm performance. ^ Essay 1 investigated the impact of an MNE’s institutional complexity, or the diversity of national institutions facing an MNE’s network of subsidiaries, on the top management team (TMT) compensation. This essay proposed that greater political and cultural complexity leads to not only greater TMT total compensation but also to a greater portion of TMT compensation linked with long-term performance. The arguments are supported in this essay by using an unbalanced panel dataset including 296 U.S. firms with 1,340 observations. ^ Essay 2 explored TMT social capital and its moderating role on value creation and appropriation by the chief executive officer (CEO). Using a sample with 548 U.S. firms and 2,010 observations, it found that greater TMT social capital does facilitate the effects of CEO intellectual capital and social capital on firm growth. Finally, essay 3 examined the performance implications for the fit between managerial information-processing capabilities and institutional complexity. It proposed that institutional complexity is associated with the needs of information-processing. On the other hand, smaller TMT turnover and larger TMT size reflect larger managerial information-processing capabilities. Consequently, superior performance is achieved by the match among institutional complexity, TMT turnover, and TMT size. All hypotheses in essay 3 are supported in a sample of 301 U.S. firms and 1,404 observations. ^ To conclude, this dissertation advances and extends our knowledge on the roles of institutional environments and top executives on firm performance and top executive compensation.^
Resumo:
Pre-treatment HCV quasispecies complexity and diversity may predict response to interferon based anti-viral therapy. The objective of this study was to retrospectively (1) examine temporal changes in quasispecies prior to the start of therapy and (2) investigate extensively quasispecies evolution in a group of 10 chronically infected patients with genotype 3a, treated with pegylated alpha 2a-Interferon and ribavirin. The degree of sequence heterogeneity within the hypervariable region 1 was assessed by analyzing 20-30 individual clones in serial serum samples. Genetic parameters, including amino acid Shannon entropy, Hamming distance and genetic distance were calculated for each sample. Treatment outcome was divided into (1) sustained virological responders (SVR) and (2) treatment failure (TF).Our results indicate, (1) quasispecies complexity and diversity are lower in the SVR group, (2) quasispecies vary temporally and (3) genetic heterogeneity at baseline can be used to predict treatment outcome. We discuss the results from the perspective of replicative homeostasis. We discuss the results from the perspective of replicative homeostasis.
(Table 1) Sample descriptions and results: Carbon, lipid, and kerogen analyses, at DSDP Leg 64 Holes
Resumo:
Pleistocene sediments in the Guaymas Basin, Gulf of California, have been intruded by sills and their organic matter thus subjected to thermal stress. Sediment samples from DSDP/IPOD Sites 477, 478, and 481, and samples of thermally unaltered materials from Sites 474 and 479 were analyzed to characterize the lipids and kerogens and to evaluate the effects of the intrusive thermal stresses. The lipids of the thermally unaltered samples are derived from microbial and terrestrial higher-plant detritus. The samples from the sill proximities contain the distillates, and those adjacent to the sills contain essentially no lipids. The pyrograms of the kerogens from the unaltered samples reflect their predominantly autochthonous microbial origin. When compared with the unaltered samples, the pyrograms of the altered kerogen samples reflect the thermal effects by a reduction in the complexity of the products. Kerogens adjacent to the sills produced little or no pyrolysis products. The effects of intrusions into unconsolidated, wet sediments resulted in in situ pyrolysis of the organic matter, as confirmed by these data.
Resumo:
La possibilité d’estimer l’impact du changement climatique en cours sur le comportement hydrologique des hydro-systèmes est une nécessité pour anticiper les adaptations inévitables et nécessaires que doivent envisager nos sociétés. Dans ce contexte, ce projet doctoral présente une étude sur l’évaluation de la sensibilité des projections hydrologiques futures à : (i) La non-robustesse de l’identification des paramètres des modèles hydrologiques, (ii) l’utilisation de plusieurs jeux de paramètres équifinaux et (iii) l’utilisation de différentes structures de modèles hydrologiques. Pour quantifier l’impact de la première source d’incertitude sur les sorties des modèles, quatre sous-périodes climatiquement contrastées sont tout d’abord identifiées au sein des chroniques observées. Les modèles sont calés sur chacune de ces quatre périodes et les sorties engendrées sont analysées en calage et en validation en suivant les quatre configurations du Different Splitsample Tests (Klemeš, 1986;Wilby, 2005; Seiller et al. (2012);Refsgaard et al. (2014)). Afin d’étudier la seconde source d’incertitude liée à la structure du modèle, l’équifinalité des jeux de paramètres est ensuite prise en compte en considérant pour chaque type de calage les sorties associées à des jeux de paramètres équifinaux. Enfin, pour évaluer la troisième source d’incertitude, cinq modèles hydrologiques de différents niveaux de complexité sont appliqués (GR4J, MORDOR, HSAMI, SWAT et HYDROTEL) sur le bassin versant québécois de la rivière Au Saumon. Les trois sources d’incertitude sont évaluées à la fois dans conditions climatiques observées passées et dans les conditions climatiques futures. Les résultats montrent que, en tenant compte de la méthode d’évaluation suivie dans ce doctorat, l’utilisation de différents niveaux de complexité des modèles hydrologiques est la principale source de variabilité dans les projections de débits dans des conditions climatiques futures. Ceci est suivi par le manque de robustesse de l’identification des paramètres. Les projections hydrologiques générées par un ensemble de jeux de paramètres équifinaux sont proches de celles associées au jeu de paramètres optimal. Par conséquent, plus d’efforts devraient être investis dans l’amélioration de la robustesse des modèles pour les études d’impact sur le changement climatique, notamment en développant les structures des modèles plus appropriés et en proposant des procédures de calage qui augmentent leur robustesse. Ces travaux permettent d’apporter une réponse détaillée sur notre capacité à réaliser un diagnostic des impacts des changements climatiques sur les ressources hydriques du bassin Au Saumon et de proposer une démarche méthodologique originale d’analyse pouvant être directement appliquée ou adaptée à d’autres contextes hydro-climatiques.
Resumo:
Background Many acute stroke trials have given neutral results. Sub-optimal statistical analyses may be failing to detect efficacy. Methods which take account of the ordinal nature of functional outcome data are more efficient. We compare sample size calculations for dichotomous and ordinal outcomes for use in stroke trials. Methods Data from stroke trials studying the effects of interventions known to positively or negatively alter functional outcome – Rankin Scale and Barthel Index – were assessed. Sample size was calculated using comparisons of proportions, means, medians (according to Payne), and ordinal data (according to Whitehead). The sample sizes gained from each method were compared using Friedman 2 way ANOVA. Results Fifty-five comparisons (54 173 patients) of active vs. control treatment were assessed. Estimated sample sizes differed significantly depending on the method of calculation (Po00001). The ordering of the methods showed that the ordinal method of Whitehead and comparison of means produced significantly lower sample sizes than the other methods. The ordinal data method on average reduced sample size by 28% (inter-quartile range 14–53%) compared with the comparison of proportions; however, a 22% increase in sample size was seen with the ordinal method for trials assessing thrombolysis. The comparison of medians method of Payne gave the largest sample sizes. Conclusions Choosing an ordinal rather than binary method of analysis allows most trials to be, on average, smaller by approximately 28% for a given statistical power. Smaller trial sample sizes may help by reducing time to completion, complexity, and financial expense. However, ordinal methods may not be optimal for interventions which both improve functional outcome
Resumo:
Raman spectroscopy has been used to study a selection of vivianites from different origins. A band is identified at around 3480 cm-1 whose intensity is sample dependent. The band is attributed to the stretching vibration of Fe3+ OH units which are formed through the autooxidation of the vivianite minerals either by self-oxidation or by photocatalytic oxidation according to the reaction: (Fe2+)3(PO4)2·8H2O + 1/2O2 (Fe2+)3– x(Fe3+)x(PO4)2(OH)x·(8–x)H2O in which some of the water of crystallization is converted to hydroxyl anions. Complexity of the OH stretching region through the overlap of broad bands is reflected in the water HOH deformation modes at 1660 cm–1. Using the infrared bands at 3281, 3105 and 3025 cm–1, hydrogen bond distances of 2.734(5), 2.675(2) and 2.655(2) Å are calculated. Vivianites are characterised by an intense band at 950 cm–1 assigned to the PO4 symmetric stretching vibration. Low Raman intensity bands are observed at ~1077, ~1050, 1015 and ~ 985 cm–1 assigned to the phosphate PO4 antisymmetric stretching vibrations. Multiple antisymmetric stretching vibrations are due to the reduced tetrahedral symmetry. This loss of degeneracy is also reflected in the bending modes. Two bands are observed at ~ 423 and ~ 456 cm–1 assigned to the2bending modes. For the vivianites four bands are observed at ~ 584, ~ 571, ~ 545 and ~ 525 cm–1 assigned to the 4modes of vivianite.
Resumo:
We generalize the classical notion of Vapnik–Chernovenkis (VC) dimension to ordinal VC-dimension, in the context of logical learning paradigms. Logical learning paradigms encompass the numerical learning paradigms commonly studied in Inductive Inference. A logical learning paradigm is defined as a set W of structures over some vocabulary, and a set D of first-order formulas that represent data. The sets of models of ϕ in W, where ϕ varies over D, generate a natural topology W over W. We show that if D is closed under boolean operators, then the notion of ordinal VC-dimension offers a perfect characterization for the problem of predicting the truth of the members of D in a member of W, with an ordinal bound on the number of mistakes. This shows that the notion of VC-dimension has a natural interpretation in Inductive Inference, when cast into a logical setting. We also study the relationships between predictive complexity, selective complexity—a variation on predictive complexity—and mind change complexity. The assumptions that D is closed under boolean operators and that W is compact often play a crucial role to establish connections between these concepts. We then consider a computable setting with effective versions of the complexity measures, and show that the equivalence between ordinal VC-dimension and predictive complexity fails. More precisely, we prove that the effective ordinal VC-dimension of a paradigm can be defined when all other effective notions of complexity are undefined. On a better note, when W is compact, all effective notions of complexity are defined, though they are not related as in the noncomputable version of the framework.