892 resultados para Numbers, Random


Relevância:

20.00% 20.00%

Publicador:

Resumo:

An Internet portal accessible at www.gdb.unibe.ch has been set up to automatically generate color-coded similarity maps of the ChEMBL database in relation to up to two sets of active compounds taken from the enhanced Directory of Useful Decoys (eDUD), a random set of molecules, or up to two sets of user-defined reference molecules. These maps visualize the relationships between the selected compounds and ChEMBL in six different high dimensional chemical spaces, namely MQN (42-D molecular quantum numbers), SMIfp (34-D SMILES fingerprint), APfp (20-D shape fingerprint), Xfp (55-D pharmacophore fingerprint), Sfp (1024-bit substructure fingerprint), and ECfp4 (1024-bit extended connectivity fingerprint). The maps are supplied in form of Java based desktop applications called “similarity mapplets” allowing interactive content browsing and linked to a “Multifingerprint Browser for ChEMBL” (also accessible directly at www.gdb.unibe.ch) to perform nearest neighbor searches. One can obtain six similarity mapplets of ChEMBL relative to random reference compounds, 606 similarity mapplets relative to single eDUD active sets, 30 300 similarity mapplets relative to pairs of eDUD active sets, and any number of similarity mapplets relative to user-defined reference sets to help visualize the structural diversity of compound series in drug optimization projects and their relationship to other known bioactive compounds.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Theory on plant succession predicts a temporal increase in the complexity of spatial community structure and of competitive interactions: initially random occurrences of early colonising species shift towards spatially and competitively structured plant associations in later successional stages. Here we use long-term data on early plant succession in a German post mining area to disentangle the importance of random colonisation, habitat filtering, and competition on the temporal and spatial development of plant community structure. We used species co-occurrence analysis and a recently developed method for assessing competitive strength and hierarchies (transitive versus intransitive competitive orders) in multispecies communities. We found that species turnover decreased through time within interaction neighbourhoods, but increased through time outside interaction neighbourhoods. Successional change did not lead to modular community structure. After accounting for species richness effects, the strength of competitive interactions and the proportion of transitive competitive hierarchies increased through time. Although effects of habitat filtering were weak, random colonization and subsequent competitive interactions had strong effects on community structure. Because competitive strength and transitivity were poorly correlated with soil characteristics, there was little evidence for context dependent competitive strength associated with intransitive competitive hierarchies.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper presents a parallel surrogate-based global optimization method for computationally expensive objective functions that is more effective for larger numbers of processors. To reach this goal, we integrated concepts from multi-objective optimization and tabu search into, single objective, surrogate optimization. Our proposed derivative-free algorithm, called SOP, uses non-dominated sorting of points for which the expensive function has been previously evaluated. The two objectives are the expensive function value of the point and the minimum distance of the point to previously evaluated points. Based on the results of non-dominated sorting, P points from the sorted fronts are selected as centers from which many candidate points are generated by random perturbations. Based on surrogate approximation, the best candidate point is subsequently selected for expensive evaluation for each of the P centers, with simultaneous computation on P processors. Centers that previously did not generate good solutions are tabu with a given tenure. We show almost sure convergence of this algorithm under some conditions. The performance of SOP is compared with two RBF based methods. The test results show that SOP is an efficient method that can reduce time required to find a good near optimal solution. In a number of cases the efficiency of SOP is so good that SOP with 8 processors found an accurate answer in less wall-clock time than the other algorithms did with 32 processors.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

With hundreds of single nucleotide polymorphisms (SNPs) in a candidate gene and millions of SNPs across the genome, selecting an informative subset of SNPs to maximize the ability to detect genotype-phenotype association is of great interest and importance. In addition, with a large number of SNPs, analytic methods are needed that allow investigators to control the false positive rate resulting from large numbers of SNP genotype-phenotype analyses. This dissertation uses simulated data to explore methods for selecting SNPs for genotype-phenotype association studies. I examined the pattern of linkage disequilibrium (LD) across a candidate gene region and used this pattern to aid in localizing a disease-influencing mutation. The results indicate that the r2 measure of linkage disequilibrium is preferred over the common D′ measure for use in genotype-phenotype association studies. Using step-wise linear regression, the best predictor of the quantitative trait was not usually the single functional mutation. Rather it was a SNP that was in high linkage disequilibrium with the functional mutation. Next, I compared three strategies for selecting SNPs for application to phenotype association studies: based on measures of linkage disequilibrium, based on a measure of haplotype diversity, and random selection. The results demonstrate that SNPs selected based on maximum haplotype diversity are more informative and yield higher power than randomly selected SNPs or SNPs selected based on low pair-wise LD. The data also indicate that for genes with small contribution to the phenotype, it is more prudent for investigators to increase their sample size than to continuously increase the number of SNPs in order to improve statistical power. When typing large numbers of SNPs, researchers are faced with the challenge of utilizing an appropriate statistical method that controls the type I error rate while maintaining adequate power. We show that an empirical genotype based multi-locus global test that uses permutation testing to investigate the null distribution of the maximum test statistic maintains a desired overall type I error rate while not overly sacrificing statistical power. The results also show that when the penetrance model is simple the multi-locus global test does as well or better than the haplotype analysis. However, for more complex models, haplotype analyses offer advantages. The results of this dissertation will be of utility to human geneticists designing large-scale multi-locus genotype-phenotype association studies. ^

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper examines how preference correlation and intercorrelation combine to influence the length of a decentralized matching market's path to stability. In simulated experiments, marriage markets with various preference specifications begin at an arbitrary matching of couples and proceed toward stability via the random mechanism proposed by Roth and Vande Vate (1990). The results of these experiments reveal that fundamental preference characteristics are critical in predicting how long the market will take to reach a stable matching. In particular, intercorrelation and correlation are shown to have an exponential impact on the number of blocking pairs that must be randomly satisfied before stability is attained. The magnitude of the impact is dramatically different, however, depending on whether preferences are positively or negatively intercorrelated.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We present a framework for fitting multiple random walks to animal movement paths consisting of ordered sets of step lengths and turning angles. Each step and turn is assigned to one of a number of random walks, each characteristic of a different behavioral state. Behavioral state assignments may be inferred purely from movement data or may include the habitat type in which the animals are located. Switching between different behavioral states may be modeled explicitly using a state transition matrix estimated directly from data, or switching probabilities may take into account the proximity of animals to landscape features. Model fitting is undertaken within a Bayesian framework using the WinBUGS software. These methods allow for identification of different movement states using several properties of observed paths and lead naturally to the formulation of movement models. Analysis of relocation data from elk released in east-central Ontario, Canada, suggests a biphasic movement behavior: elk are either in an "encamped" state in which step lengths are small and turning angles are high, or in an "exploratory" state, in which daily step lengths are several kilometers and turning angles are small. Animals encamp in open habitat (agricultural fields and opened forest), but the exploratory state is not associated with any particular habitat type.

Relevância:

20.00% 20.00%

Publicador:

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Objective. To determine the accuracy of the urine protein:creatinine ratio (pr:cr) in predicting 300 mg of protein in 24-hour urine collection in pregnant patients with suspected preeclampsia. ^ Methods. A systematic review was performed. Articles were identified through electronic databases and the relevant citations were hand searching of textbooks and review articles. Included studies evaluated patients for suspected preeclampsia with a 24-hour urine sample and a pr:cr. Only English language articles were included. The studies that had patients with chronic illness such as chronic hypertension, diabetes mellitus or renal impairment were excluded from the review. Two researchers extracted accuracy data for pr:cr relative to a gold standard of 300 mg of protein in 24-hour sample as well as population and study characteristics. The data was analyzed and summarized in tabular and graphical form. ^ Results. Sixteen studies were identified and only three studies met our inclusion criteria with 510 total patients. The studies evaluated different cut-points for positivity of pr:cr from 130 mg/g to 700 mg/g. Sensitivities and specificities for pr:cr of 130mg/g -150 mg/g were 90-93% and 33-65%, respectively; for a pr:cr of 300 mg/g were 81-95% and 52-80%, respectively; for a pr:cr of 600-700mg/g were 85-87% and 96-97%, respectively. ^ Conclusion. The value of a random pr:cr to exclude pre-eclampsia is limited because even low levels of pr:cr (130-150 mg/g) may miss up to 10% of patients with significant proteinuria. A pr:cr of more than 600 mg/g may obviate a 24-hour collection.^

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Random Forests™ is reported to be one of the most accurate classification algorithms in complex data analysis. It shows excellent performance even when most predictors are noisy and the number of variables is much larger than the number of observations. In this thesis Random Forests was applied to a large-scale lung cancer case-control study. A novel way of automatically selecting prognostic factors was proposed. Also, synthetic positive control was used to validate Random Forests method. Throughout this study we showed that Random Forests can deal with large number of weak input variables without overfitting. It can account for non-additive interactions between these input variables. Random Forests can also be used for variable selection without being adversely affected by collinearities. ^ Random Forests can deal with the large-scale data sets without rigorous data preprocessing. It has robust variable importance ranking measure. Proposed is a novel variable selection method in context of Random Forests that uses the data noise level as the cut-off value to determine the subset of the important predictors. This new approach enhanced the ability of the Random Forests algorithm to automatically identify important predictors for complex data. The cut-off value can also be adjusted based on the results of the synthetic positive control experiments. ^ When the data set had high variables to observations ratio, Random Forests complemented the established logistic regression. This study suggested that Random Forests is recommended for such high dimensionality data. One can use Random Forests to select the important variables and then use logistic regression or Random Forests itself to estimate the effect size of the predictors and to classify new observations. ^ We also found that the mean decrease of accuracy is a more reliable variable ranking measurement than mean decrease of Gini. ^

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This data set contains aboveground community biomass (Sown plant community, Weed plant community, Dead plant material, and Unidentified plant material; all measured in biomass as dry weight) and species-specific biomass from the sown species of the main experiment plots of a large grassland biodiversity experiment (the Jena Experiment; see further details below). In the main experiment, 82 grassland plots of 20 x 20 m were established from a pool of 60 species belonging to four functional groups (grasses, legumes, tall and small herbs). In May 2002, varying numbers of plant species from this species pool were sown into the plots to create a gradient of plant species richness (1, 2, 4, 8, 16 and 60 species) and functional richness (1, 2, 3, 4 functional groups). Plots were maintained by bi-annual weeding and mowing. Aboveground community biomass was harvested twice in 2004 just prior to mowing (during peak standing biomass in late May and in late August) on all experimental plots of the main experiment. This was done by clipping the vegetation at 3 cm above ground in four rectangles of 0.2 x 0.5 m per large plot. The location of these rectangles was assigned prior to each harvest by random selection of coordinates within the core area of the plots (i.e. the central 10 x 15 m). The positions of the rectangles within plots were identical for all plots. The harvested biomass was sorted into categories: individual species for the sown plant species, weed plant species (species not sown at the particular plot), detached dead plant material (i.e., dead plant material in the data file), and remaining plant material that could not be assigned to any category (i.e., unidentified plant material in the data file). All biomass was dried to constant weight (70°C, >= 48 h) and weighed. Sown plant community biomass was calculated as the sum of the biomass of the individual sown species. The data for individual samples and the mean over samples for the biomass measures on the community level are given. Overall, analyses of the community biomass data have identified species richness as well as functional group composition as important drivers of a positive biodiversity-productivity relationship.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This data set contains aboveground community biomass (Sown plant community, Weed plant community, Dead plant material, and Unidentified plant material; all measured in biomass as dry weight) and species-specific biomass from the sown species of the main experiment plots of a large grassland biodiversity experiment (the Jena Experiment; see further details below). In the main experiment, 82 grassland plots of 20 x 20 m were established from a pool of 60 species belonging to four functional groups (grasses, legumes, tall and small herbs). In May 2002, varying numbers of plant species from this species pool were sown into the plots to create a gradient of plant species richness (1, 2, 4, 8, 16 and 60 species) and functional richness (1, 2, 3, 4 functional groups). Plots were maintained by bi-annual weeding and mowing. Aboveground community biomass was harvested twice in 2007 just prior to mowing (during peak standing biomass in early June and in late August) on all experimental plots of the main experiment. This was done by clipping the vegetation at 3 cm above ground in four (May) or three (August) rectangles of 0.2 x 0.5 m per large plot. The location of these rectangles was assigned prior to each harvest by random selection of coordinates within the core area of the plots (i.e. the central 10 x 15 m). The positions of the rectangles within plots were identical for all plots. The harvested biomass was sorted into categories: individual species for the sown plant species, weed plant species (species not sown at the particular plot), detached dead plant material (i.e., dead plant material in the data file), and remaining plant material that could not be assigned to any category (i.e., unidentified plant material in the data file). All biomass was dried to constant weight (70°C, >= 48 h) and weighed. Sown plant community biomass was calculated as the sum of the biomass of the individual sown species. The data for individual samples and the mean over samples for the biomass measures on the community level are given. Overall, analyses of the community biomass data have identified species richness as well as functional group composition as important drivers of a positive biodiversity-productivity relationship.