954 resultados para size-selection


Relevância:

30.00% 30.00%

Publicador:

Resumo:

Random Forests™ is reported to be one of the most accurate classification algorithms in complex data analysis. It shows excellent performance even when most predictors are noisy and the number of variables is much larger than the number of observations. In this thesis Random Forests was applied to a large-scale lung cancer case-control study. A novel way of automatically selecting prognostic factors was proposed. Also, synthetic positive control was used to validate Random Forests method. Throughout this study we showed that Random Forests can deal with large number of weak input variables without overfitting. It can account for non-additive interactions between these input variables. Random Forests can also be used for variable selection without being adversely affected by collinearities. ^ Random Forests can deal with the large-scale data sets without rigorous data preprocessing. It has robust variable importance ranking measure. Proposed is a novel variable selection method in context of Random Forests that uses the data noise level as the cut-off value to determine the subset of the important predictors. This new approach enhanced the ability of the Random Forests algorithm to automatically identify important predictors for complex data. The cut-off value can also be adjusted based on the results of the synthetic positive control experiments. ^ When the data set had high variables to observations ratio, Random Forests complemented the established logistic regression. This study suggested that Random Forests is recommended for such high dimensionality data. One can use Random Forests to select the important variables and then use logistic regression or Random Forests itself to estimate the effect size of the predictors and to classify new observations. ^ We also found that the mean decrease of accuracy is a more reliable variable ranking measurement than mean decrease of Gini. ^

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Sizes and power of selected two-sample tests of the equality of survival distributions are compared by simulation for small samples from unequally, randomly-censored exponential distributions. The tests investigated include parametric tests (F, Score, Likelihood, Asymptotic), logrank tests (Mantel, Peto-Peto), and Wilcoxon-Type tests (Gehan, Prentice). Equal sized samples, n = 18, 16, 32 with 1000 (size) and 500 (power) simulation trials, are compared for 16 combinations of the censoring proportions 0%, 20%, 40%, and 60%. For n = 8 and 16, the Asymptotic, Peto-Peto, and Wilcoxon tests perform at nominal 5% size expectations, but the F, Score and Mantel tests exceeded 5% size confidence limits for 1/3 of the censoring combinations. For n = 32, all tests showed proper size, with the Peto-Peto test most conservative in the presence of unequal censoring. Powers of all tests are compared for exponential hazard ratios of 1.4 and 2.0. There is little difference in power characteristics of the tests within the classes of tests considered. The Mantel test showed 90% to 95% power efficiency relative to parametric tests. Wilcoxon-type tests have the lowest relative power but are robust to differential censoring patterns. A modified Peto-Peto test shows power comparable to the Mantel test. For n = 32, a specific Weibull-exponential comparison of crossing survival curves suggests that the relative powers of logrank and Wilcoxon-type tests are dependent on the scale parameter of the Weibull distribution. Wilcoxon-type tests appear more powerful than logrank tests in the case of late-crossing and less powerful for early-crossing survival curves. Guidelines for the appropriate selection of two-sample tests are given. ^

Relevância:

30.00% 30.00%

Publicador:

Resumo:

When choosing among models to describe categorical data, the necessity to consider interactions makes selection more difficult. With just four variables, considering all interactions, there are 166 different hierarchical models and many more non-hierarchical models. Two procedures have been developed for categorical data which will produce the "best" subset or subsets of each model size where size refers to the number of effects in the model. Both procedures are patterned after the Leaps and Bounds approach used by Furnival and Wilson for continuous data and do not generally require fitting all models. For hierarchical models, likelihood ratio statistics (G('2)) are computed using iterative proportional fitting and "best" is determined by comparing, among models with the same number of effects, the Pr((chi)(,k)('2) (GREATERTHEQ) G(,ij)('2)) where k is the degrees of freedom for ith model of size j. To fit non-hierarchical as well as hierarchical models, a weighted least squares procedure has been developed.^ The procedures are applied to published occupational data relating to the occurrence of byssinosis. These results are compared to previously published analyses of the same data. Also, the procedures are applied to published data on symptoms in psychiatric patients and again compared to previously published analyses.^ These procedures will make categorical data analysis more accessible to researchers who are not statisticians. The procedures should also encourage more complex exploratory analyses of epidemiologic data and contribute to the development of new hypotheses for study. ^

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Microzooplankton (the 20 to 200 µm size class of zooplankton) is recognised as an important part of marine pelagic ecosystems. In terms of biomass and abundance heterotrophic dinoflagellates are one of the important groups of organism in microzooplankton. However, their rates - grazing and growth - , feeding behaviour and prey preferences are poorly known and understood. A set of data was assembled in order to derive a better understanding of heterotrophic dinoflagellates rates, in response to parameters such as prey concentration, prey type (size and species), temperature and their own size. With these objectives, literature was searched for laboratory experiments with information on one or more of these parameters effect studied. The criteria for selection and inclusion in the database included: (i) controlled laboratory experiment with a known dinoflagellate feeding on a known prey; (ii) presence of ancillary information about experimental conditions, used organisms - cell volume, cell dimensions, and carbon content. Rates and ancillary information were measured in units that meet the experimenter need, creating a need to harmonize the data units after collection. In addition different units can link to different mechanisms (carbon to nutritive quality of the prey, volume to size limits). As a result, grazing rates are thus available as pg C dinoflagellate-1 h-1, µm3 dinoflagellate-1 h-1 and prey cell dinoflagellate-1 h-1; clearance rate was calculated if not given and growth rate is expressed as the growth rate per day.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

When a firm decides to implement ERP softwares, the resulting consequences can pervade all levels, includ- ing organization, process, control and available information. Therefore, the first decision to be made is which ERP solution must be adopted from a wide range of offers and vendors. To this end, this paper describes a methodology based on multi-criteria factors that directly affects the process to help managers make this de- cision. This methodology has been applied to a medium-size company in the Spanish metal transformation sector which is interested in updating its IT capabilities in order to obtain greater control of and better infor- mation about business, thus achieving a competitive advantage. The paper proposes a decision matrix which takes into account all critical factors in ERP selection.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Publishing Linked Data is a process that involves several design decisions and technologies. Although some initial guidelines have been already provided by Linked Data publishers, these are still far from covering all the steps that are necessary (from data source selection to publication) or giving enough details about all these steps, technologies, intermediate products, etc. Furthermore, given the variety of data sources from which Linked Data can be generated, we believe that it is possible to have a single and uni�ed method for publishing Linked Data, but we should rely on di�erent techniques, technologies and tools for particular datasets of a given domain. In this paper we present a general method for publishing Linked Data and the application of the method to cover di�erent sources from di�erent domains.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Knowing the size of the terms to which program variables are bound at run-time in logic programs is required in a class of applications related to program optimization such as, for example, granularity analysis and selection among different algorithms or control rules whose performance may be dependent on such size. Such size is difficult to even approximate at compile time and is thus generally computed at run-time by using (possibly predefined) predicates which traverse the terms involved. We propose a technique based on program transformation which has the potential of performing this computation much more efficiently. The technique is based on finding program procedures which are called before those in which knowledge regarding term sizes is needed and which traverse the terms whose size is to be determined, and transforming such procedures so that they compute term sizes "on the fly". We present a systematic way of determining whether a given program can be transformed in order to compute a given term size at a given program point without additional term traversal. Also, if several such transformations are possible our approach allows finding minimal transformations under certain criteria. We also discuss the advantages and applications of our technique and present some performance results.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Swordtail fish (Poeciliidae: genus Xiphophorus) are a paradigmatic case of sexual selection by sensory exploitation. Female preference for males with a conspicuous “sword” ornament is ancestral, suggesting that male morphology has evolved in response to a preexisting bias. The perceptual mechanisms underlying female mate choice have not been identified, complicating efforts to understand the selection pressures acting on ornament design. We consider two alternative models of receiver behavior, each consistent with previous results. Females could respond either to specific characteristics of the sword or to more general cues, such as the apparent size of potential mates. We showed female swordtails a series of computer-altered video sequences depicting a courting male. Footage of an intact male was preferred strongly to otherwise identical sequences in which portions of the sword had been deleted selectively, but a disembodied courting sword was less attractive than an intact male. There was no difference between responses to an isolated sword and to a swordless male of comparable length, or between an isolated sword and a homogenous background. Female preference for a sworded male was abolished by enlarging the image of a swordless male to compensate for the reduction in length caused by removing the ornament. This pattern of results is consistent with mate choice being mediated by a general preference for large males rather than by specific characters. Similar processes may account for the evolution of exaggerated traits in other systems.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Few experiments have demonstrated a genetic correlation between the process of sexual selection and fitness benefits in offspring, either through female choice or male competition. Those that have looked at the relationship between female choice and offspring fitness have focused on juvenile fitness components, rather than fitness at later stages in the life cycle. In addition, many of these studies have not controlled for possible maternal effects. To test for a relationship between sexual selection and adult fitness, we carried out an artificial selection experiment in the fruit fly, Drosophila melanogaster. We created two treatments that varied in the level of opportunity for sexual selection. Increased opportunity for female choice and male competition was genetically correlated with an increase in adult survivorship, as well as an increase in male and female body size. Contrary to previous, single-generation studies, we did not find an increase in larval competitive ability. This study demonstrates that mate choice and/or male–male competition are correlated with an increase in at least one adult fitness component of offspring.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Starving Dictyostelium amoebae emit pulses of the chemoattractant cAMP that are relayed from cell to cell as circular and spiral waves. We have recently modeled spiral wave formation in Dictyostelium. Our model suggests that a secreted protein inhibitor of an extracellular cAMP phosphodiesterase selects for spirals. Herein we test the essential features of this prediction by comparing wave propagation in wild type and inhibitor mutants. We find that mutants rarely form spirals. The territory size of mutant strains is approximately 50 times smaller than wild type, and the mature fruiting bodies are smaller but otherwise normal. These results identify a mechanism for selecting one wave symmetry over another in an excitable system and suggest that the phosphodiesterase inhibitor may be under selection because it helps regulate territory size.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Parasites have been argued to influence clutch size evolution, but past work and theory has largely focused on within-species optimization solutions rather than clearly addressing among-species variation. The effects of parasites on clutch size variation among species can be complex, however, because different parasites can induce age-specific differences in mortality that can cause clutch size to evolve in different directions. We provide a conceptual argument that differences in immunocompetence among species should integrate differences in overall levels of parasite-induced mortality to which a species is exposed. We test this assumption and show that mortality caused by parasites is positively correlated with immunocompetence measured by cell-mediated measures. Under life history theory, clutch size should increase with increased adult mortality and decrease with increased juvenile mortality. Using immunocompetence as a general assay of parasite-induced mortality, we tested these predictions by using data for 25 species. We found that clutch size increased strongly with adult immunocompetence. In contrast, clutch size decreased weakly with increased juvenile immunocompetence. But, immunocompetence of juveniles may be constrained by selection on adults, and, when we controlled for adult immunocompetence, clutch size decreased with juvenile immunocompetence. Thus, immunocompetence seems to reflect evolutionary differences in parasite virulence experienced by species, and differences in age-specific parasite virulence appears to exert opposite selection on clutch size evolution.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Patterns of variation at the Sod locus of Drosophila melanogaster suggest that the protein polymorphism at this locus has very recently arisen. In addition, it appears that a previously rare DNA variant has been recently and rapidly driven to intermediate frequency. From the size of the region (>20 kb) that has been swept along with this rare variant, and patterns of linkage disequilibrium in the region, it is inferred that strength of selection was large (s > 0.01) and that the sweep occurred more than 25,000 generations ago. In addition, there are striking similarities to patterns of variation observed at the Est6 and Est-P loci, which are located approximately 1,000 kb from Sod.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Identification of individual major genes affecting quantitative traits in livestock species has been limited to date. By using a candidate gene approach and a divergent breed cross involving the Chinese Meishan pig, we have shown that a specific allele of the estrogen receptor (ER) locus is associated with increased litter size. Female pigs from synthetic lines with a 50% Meishan background that were homozygous for this beneficial allele produced 2.3 more pigs in first parities and 1.5 more pigs averaged over all parities than females from the same synthetic lines and homozygous for the undesirable allele. This beneficial ER allele was also found in pigs with Large White breed ancestory. Analysis of females with Large White breed background showed an advantage for females homozygous for the beneficial allele as compared to females homozygous for the other allele of more than 1 total pig born. Analyses of growth performance test records detected no significant unfavorable associations of the beneficial allele with growth and developmental traits. Mapping of the ER gene demonstrated that the closest known genes or markers were 3 centimorgans from ER. To our knowledge, one of these, superoxide dismutase gene (SOD2), was mapped for the first time in the pig. Analysis of ER and these linked markers indicated that ER is the best predictor of litter size differences. Introgression of the beneficial allele into commercial pig breeding lines, in which the allele was not present, and marker-assisted selection for the beneficial allele in lines with Meishan and Large White background have begun.