170 resultados para outlier
Resumo:
We have investigated the use of hierarchical clustering of flow cytometry data to classify samples of conventional central chondrosarcoma, a malignant cartilage forming tumor of uncertain cellular origin, according to similarities with surface marker profiles of several known cell types. Human primary chondrosarcoma cells, articular chondrocytes, mesenchymal stem cells, fibroblasts, and a panel of tumor cell lines from chondrocytic or epithelial origin were clustered based on the expression profile of eleven surface markers. For clustering, eight hierarchical clustering algorithms, three distance metrics, as well as several approaches for data preprocessing, including multivariate outlier detection, logarithmic transformation, and z-score normalization, were systematically evaluated. By selecting clustering approaches shown to give reproducible results for cluster recovery of known cell types, primary conventional central chondrosacoma cells could be grouped in two main clusters with distinctive marker expression signatures: one group clustering together with mesenchymal stem cells (CD49b-high/CD10-low/CD221-high) and a second group clustering close to fibroblasts (CD49b-low/CD10-high/CD221-low). Hierarchical clustering also revealed substantial differences between primary conventional central chondrosarcoma cells and established chondrosarcoma cell lines, with the latter not only segregating apart from primary tumor cells and normal tissue cells, but clustering together with cell lines from epithelial lineage. Our study provides a foundation for the use of hierarchical clustering applied to flow cytometry data as a powerful tool to classify samples according to marker expression patterns, which could lead to uncover new cancer subtypes.
Resumo:
Seventeen bones (sixteen cadaveric bones and one plastic bone) were used to validate a method for reconstructing a surface model of the proximal femur from 2D X-ray radiographs and a statistical shape model that was constructed from thirty training surface models. Unlike previously introduced validation studies, where surface-based distance errors were used to evaluate the reconstruction accuracy, here we propose to use errors measured based on clinically relevant morphometric parameters. For this purpose, a program was developed to robustly extract those morphometric parameters from the thirty training surface models (training population), from the seventeen surface models reconstructed from X-ray radiographs, and from the seventeen ground truth surface models obtained either by a CT-scan reconstruction method or by a laser-scan reconstruction method. A statistical analysis was then performed to classify the seventeen test bones into two categories: normal cases and outliers. This classification step depends on the measured parameters of the particular test bone. In case all parameters of a test bone were covered by the training population's parameter ranges, this bone is classified as normal bone, otherwise as outlier bone. Our experimental results showed that statistically there was no significant difference between the morphometric parameters extracted from the reconstructed surface models of the normal cases and those extracted from the reconstructed surface models of the outliers. Therefore, our statistical shape model based reconstruction technique can be used to reconstruct not only the surface model of a normal bone but also that of an outlier bone.
Resumo:
Model-based calibration of steady-state engine operation is commonly performed with highly parameterized empirical models that are accurate but not very robust, particularly when predicting highly nonlinear responses such as diesel smoke emissions. To address this problem, and to boost the accuracy of more robust non-parametric methods to the same level, GT-Power was used to transform the empirical model input space into multiple input spaces that simplified the input-output relationship and improved the accuracy and robustness of smoke predictions made by three commonly used empirical modeling methods: Multivariate Regression, Neural Networks and the k-Nearest Neighbor method. The availability of multiple input spaces allowed the development of two committee techniques: a 'Simple Committee' technique that used averaged predictions from a set of 10 pre-selected input spaces chosen by the training data and the "Minimum Variance Committee" technique where the input spaces for each prediction were chosen on the basis of disagreement between the three modeling methods. This latter technique equalized the performance of the three modeling methods. The successively increasing improvements resulting from the use of a single best transformed input space (Best Combination Technique), Simple Committee Technique and Minimum Variance Committee Technique were verified with hypothesis testing. The transformed input spaces were also shown to improve outlier detection and to improve k-Nearest Neighbor performance when predicting dynamic emissions with steady-state training data. An unexpected finding was that the benefits of input space transformation were unaffected by changes in the hardware or the calibration of the underlying GT-Power model.
Resumo:
Constructing a 3D surface model from sparse-point data is a nontrivial task. Here, we report an accurate and robust approach for reconstructing a surface model of the proximal femur from sparse-point data and a dense-point distribution model (DPDM). The problem is formulated as a three-stage optimal estimation process. The first stage, affine registration, is to iteratively estimate a scale and a rigid transformation between the mean surface model of the DPDM and the sparse input points. The estimation results of the first stage are used to establish point correspondences for the second stage, statistical instantiation, which stably instantiates a surface model from the DPDM using a statistical approach. This surface model is then fed to the third stage, kernel-based deformation, which further refines the surface model. Handling outliers is achieved by consistently employing the least trimmed squares (LTS) approach with a roughly estimated outlier rate in all three stages. If an optimal value of the outlier rate is preferred, we propose a hypothesis testing procedure to automatically estimate it. We present here our validations using four experiments, which include 1 leave-one-out experiment, 2 experiment on evaluating the present approach for handling pathology, 3 experiment on evaluating the present approach for handling outliers, and 4 experiment on reconstructing surface models of seven dry cadaver femurs using clinically relevant data without noise and with noise added. Our validation results demonstrate the robust performance of the present approach in handling outliers, pathology, and noise. An average 95-percentile error of 1.7-2.3 mm was found when the present approach was used to reconstruct surface models of the cadaver femurs from sparse-point data with noise added.
Resumo:
Hardwoods comprise about half of the biomass of forestlands in North America and present many uses including economic, ecological and aesthetic functions. Forest trees rely on the genetic variation within tree populations to overcome the many biotic, abiotic, anthropogenic factors which are further worsened by climate change, that threaten their continued survival and functionality. To harness these inherent genetic variations of tree populations, informed knowledge of the genomic resources and techniques, which are currently lacking or very limited, are imperative for forest managers. The current study therefore aimed to develop genomic microsatellite markers for the leguminous tree species, honey locust, Gleditsia triacanthos L. and test their applicability in assessing genetic variation, estimation of gene flow patterns and identification of a full-sib mapping population. We also aimed to test the usefulness of already developed nuclear and gene-based microsatellite markers in delineation of species and taxonomic relationships between four of the taxonomically difficult Section Lobatae species (Quercus coccinea, Q. ellipsoidalis, Q. rubra and Q. velutina. We recorded 100% amplification of G. triacanthos genomic microsatellites developed using Illumina sequencing techniques in a panel of seven unrelated individuals with 14 of these showing high polymorphism and reproducibility. When characterized in 36 natural population samples, we recorded 20 alleles per locus with no indication for null alleles at 13 of the 14 microsatellites. This is the first report of genomic microsatellites for this species. Honey locust trees occur in fragmented populations of abandoned farmlands and pastures and is described as essentially dioecious. Pollen dispersal if the main source of gene flow within and between populations with the ability to offset the effects of random genetic drift. Factors known to influence gene include fragmentation and degree of isolation, which make the patterns gene flow in fragmented populations of honey locust a necessity for their sustainable management. In this follow-up study, we used a subset of nine of the 14 developed gSSRs to estimate gene flow and identify a full-sib mapping population in two isolated fragments of honey locust. Our analyses indicated that the majority of the seedlings (65-100% - at both strict and relaxed assignment thresholds) were sired by pollen from outside the two fragment populations. Only one selfing event was recorded confirming the functional dioeciousness of honey locust and that the seed parents are almost completely outcrossed. From the Butternut Valley, TN population, pollen donor genotypes were reconstructed and used in paternity assignment analyses to identify a relatively large full-sib family comprised of 149 individuals, proving the usefulness of isolated forest fragments in identification of full-sib families. In the Ames Plantation stand, contemporary pollen dispersal followed a fat-tailed exponential-power distribution, an indication of effective gene flow. Our estimate of δ was 4,282.28 m, suggesting that insect pollinators of honey locust disperse pollen over very long distances. The high proportion of pollen influx into our sampled population implies that our fragment population forms part of a large effectively reproducing population. The high tendency of oak species to hybridize while still maintaining their species identity make it difficult to resolve their taxonomic relationships. Oaks of the section Lobatae are famous in this regard and remain unresolved at both morphological and genetic markers. We applied 28 microsatellite markers including outlier loci with potential roles in reproductive isolation and adaptive divergence between species to natural populations of four known interfertile red oaks, Q. coccinea, Q. ellpsoidalis, Q. rubra and Q. velutina. To better resolve the taxonomic relationships in this difficult clade, we assigned individual samples to species, identified hybrids and introgressive forms and reconstructed phylogenetic relationships among the four species after exclusion of genetically intermediate individuals. Genetic assignment analyses identified four distinct species clusters, with Q. rubra most differentiated from the three other species, but also with a comparatively large number of misclassified individuals (7.14%), hybrids (7.14%) and introgressive forms (18.83%) between Q. ellipsoidalis and Q. velutina. After the exclusion of genetically intermediate individuals, Q. ellipsoidalis grouped as sister species to the largely parapatric Q. coccinea with high bootstrap support (91 %). Genetically intermediate forms in a mixed species stand were located proximate to both potential parental species, which supports recent hybridization of Q. velutina with both Q. ellipsoidalis and Q. rubra. Analyses of genome-wide patterns of interspecific differentiation can provide a better understanding of speciation processes and taxonomic relationships in this taxonomically difficult group of red oak species.
Resumo:
Background: Speciation reversal: the erosion of species differentiation via an increase in introgressive hybridization due to the weakening of previously divergent selection regimes, is thought to be an important, yet poorly understood, driver of biodiversity loss. Our study system, the Alpine whitefish (Coregonus spp.) species complex is a classic example of a recent postglacial adaptive radiation: forming an array of endemic lake flocks, with the independent origination of similar ecotypes among flocks. However, many of the lakes of the Alpine radiation have been seriously impacted by anthropogenic nutrient enrichment, resulting in a collapse in neutral genetic and phenotypic differentiation within the most polluted lakes. Here we investigate the effects of eutrophication on the selective forces that have shaped this radiation, using population genomics. We studied eight sympatric species assemblages belonging to five independent parallel adaptive radiations, and one species pair in secondary contact. We used AFLP markers, and applied FST outlier (BAYESCAN, DFDIST) and logistic regression analyses (MATSAM), to identify candidate regions for disruptive selection in the genome and their associations with adaptive traits within each lake flock. The number of outlier and adaptive trait associated loci identified per lake were then regressed against two variables (historical phosphorus concentration and contemporary oxygen concentration) representing the strength of eutrophication. Results: Whilst we identify disruptive selection candidate regions in all lake flocks, we find similar trends, across analysis methods, towards fewer disruptive selection candidate regions and fewer adaptive trait/candidate loci associations in the more polluted lakes. Conclusions: Weakened disruptive selection and a concomitant breakdown in reproductive isolating mechanisms in more polluted lakes has lead to increased gene flow between coexisting Alpine whitefish species. We hypothesize that the resulting higher rates of interspecific recombination reduce either the number or extent of genomic islands of divergence surrounding loci evolving under disruptive natural selection. This produces the negative trend seen in the number of selection candidate loci recovered during genome scans of whitefish species flocks, with increasing levels of anthropogenic eutrophication: as the likelihood decreases that AFLP restriction sites will fall within regions of heightened genomic divergence and therefore be classified as FST outlier loci. This study explores for the first time the potential effects of human-mediated relaxation of disruptive selection on heterogeneous genomic divergence between coexisting species.
Resumo:
Cichlid fishes have evolved tremendous morphological and behavioral diversity in the waters of East Africa. Within each of the Great Lakes Tanganyika, Malawi, and Victoria, the phenomena of hybridization and retention of ancestral polymorphism explain allele sharing across species. Here, we explore the sharing of single nucleotide polymorphisms (SNPs) between the major East African cichlid assemblages. A set of approximately 200 genic and nongenic SNPs was ascertained in five Lake Malawi species and genotyped in a diverse collection of 160 species from across Africa. We observed segregating polymorphism outside of the Malawi lineage for more than 50% of these loci; this holds similarly for genic versus nongenic SNPs, as well as for SNPs at putative CpG versus non-CpG sites. Bayesian and principal component analyses of genetic structure in the data demonstrate that the Lake Malawi endemic flock is not monophyletic and that river species have likely contributed significantly to Malawi genomes. Coalescent simulations support the hypothesis that river cichlids have transported polymorphism between lake assemblages. We observed strong genetic differentiation between Malawi lineages for approximately 8% of loci, with contributions from both genic and nongenic SNPs. Notably, more than half of these outlier loci between Malawi groups are polymorphic outside of the lake. Cichlid fishes have evolved diversity in Lake Malawi as new mutations combined with standing genetic variation shared across East Africa.
Resumo:
Range expansions are extremely common, but have only recently begun to attract attention in terms of their genetic consequences. As populations expand, demes at the wave front experience strong genetic drift, which is expected to reduce genetic diversity and potentially cause ‘allele surfing’, where alleles may become fixed over a wide geographical area even if their effects are deleterious. Previous simulation models show that range expansions can generate very strong selective gradients on dispersal, reproduction, competition and immunity. To investigate the effects of range expansion on genetic diversity and adaptation, we studied the population genomics of the bank vole (Myodes glareolus) in Ireland. The bank vole was likely introduced in the late 1920s and is expanding its range at a rate of ~2.5 km/year. Using genotyping-by-sequencing, we genotyped 281 bank voles at 5979 SNP loci. Fourteen sample sites were arranged in three transects running from the introduction site to the wave front of the expansion. We found significant declines in genetic diversity along all three transects. However, there was no evidence that sites at the wave front had accumulated more deleterious mutations. We looked for outlier loci with strong correlations between allele frequency and distance from the introduction site, where the direction of correlation was the same in all three transects. Amongst these outliers, we found significant enrichment for genic SNPs, suggesting the action of selection. Candidates for selection included several genes with immunological functions and several genes that could influence behaviour.
Resumo:
Chondrostoma nasus is a cyprinid fish with highly specialized, ecologically and geographically distinct, ontogenetic trophic niches. Nase population numbers across their Swiss range have shown massive declines and many localized extinctions. Here we integrate data from different genetic markers with phenotypic and demographic data to survey patterns of neutral and adaptive genetic diversity in all extant (and one extinct) Swiss nase populations, with the aim to delineate intraspecific conservation units (CUs) and to inform future population management strategies. We discovered two major genetically and geographically distinct population groupings. The first population grouping comprises nase inhabiting rivers flowing into Lake Constance; the second comprises nase populations from Rhine drainages below Lake Constance. Within these clusters there is generally limited genetic differentiation among populations. Genomic outlier scans based on 256–377 polymorphic AFLP loci revealed little evidence of local adaptation both within and among population clusters, with the exception of one candidate locus identified in scans involving the inbred Schanzengraben population. However, significant phenotypic differentiation in body shape between certain populations suggests a need for more intensive future studies of local adaptation. Our data strongly suggests that the two major population groups should be treated as distinct CUs, with any supplemental stocking and reintroductions sourced only from within the range of the CU concerned.
Resumo:
BACKGROUND The distribution of the enzymopathy glucose-6-phosphate dehydrogenase (G6PD) deficiency is linked to areas of high malaria endemicity due to its association with protection from disease. G6PD deficiency is also identified as the cause of severe haemolysis following administration of the anti-malarial drug primaquine and further use of this drug will likely require identification of G6PD deficiency on a population level. Current conventional methods for G6PD screening have various disadvantages for field use. METHODS The WST8/1-methoxy PMS method, recently adapted for field use, was validated using a gold standard enzymatic assay (R&D Diagnostics Ltd ®) in a study involving 235 children under five years of age, who were recruited by random selection from a cohort study in Tororo, Uganda. Blood spots were collected by finger-prick onto filter paper at routine visits, and G6PD activity was determined by both tests. Performance of the WST8/1-methoxy PMS test under various temperature, light, and storage conditions was evaluated. RESULTS The WST8/1-methoxy PMS assay was found to have 72% sensitivity and 98% specificity when compared to the commercial enzymatic assay and the AUC was 0.904, suggesting good agreement. Misclassifications were at borderline values of G6PD activity between mild and normal levels, or related to outlier haemoglobin values (<8.0 gHb/dl or >14 gHb/dl) associated with ongoing anaemia or recent haemolytic crises. Although severe G6PD deficiency was not found in the area, the test enabled identification of low G6PD activity. The assay was found to be highly robust for field use; showing less light sensitivity, good performance over a wide temperature range, and good capacity for medium-to-long term storage. CONCLUSIONS The WST8/1-methoxy PMS assay was comparable to the currently used standard enzymatic test, and offers advantages in terms of cost, storage, portability and use in resource-limited settings. Such features make this test a potential key tool for deployment in the field for point of care assessment prior to primaquine administration in malaria-endemic areas. As with other G6PD tests, outlier haemoglobin levels may confound G6PD level estimation.
Resumo:
Aims The biochemical defense of lichens against herbivores and its relationship to lichen frequency are poorly understood. Therefore, we tested whether chemical compounds in lichens act as feeding defense or rather as stimulus for snail herbivory among lichens and whether experimental feeding by snails is related to lichen frequency in the field. Methods In a no-choice feeding experiment, we fed 24 lichen species to snails of two taxa from the Clausilidae and Enidae families and compared untreated lichens and lichens with compounds removed by acetone rinsing. Then, we related experimental lichen consumption with the frequency of lichen species among 158 forest plots in the field (Schwäbische Alb, Germany), where we had also sampled snail and lichen species. Important findings In five lichen species, snails preferred treated samples over untreated controls, indicating chemical feeding defense, and vice versa in two species, indicating chemical feeding stimulus. Interestingly, compared with less frequent lichen species, snails consumed more of untreated and less of treated samples of more frequent lichen species. Removing one outlier species resulted in the loss of a significant positive relationship when untreated samples were analyzed separately. However, the interaction between treatment and lichen frequency remained significant when excluding single species or including snail genus instead of taxa, indicating that our results were robust and that lumping the species to two taxa was justified. Our results imply lichen-feeding snails to prefer frequent lichens and avoid less frequent ones because of secondary compound recognition. This supports the idea that consumers adapt to the most abundant food source.
Resumo:
Tropical forests are believed to be very harsh environments for human life. It is unclear whether human beings would have ever subsisted in those environments without external resources. It is therefore possible that humans have developed recent biological adaptations in response to specific selective pressures to cope with this challenge. To understand such biological adaptations we analyzed genome-wide SNP data under a Bayesian statistics framework, looking for outlier markers with an overly large extent of differentiation between populations living in a tropical forest, as compared to genetically related populations living outside the forest in Africa and the Americas. The most significant positive selection signals were found in genes related to lipid metabolism, the immune system, body development, and RNA Polymerase III transcription initiation. The results are discussed in the light of putative tropical forest selective pressures, namely food scarcity, high prevalence of pathogens, difficulty to move, and inefficient thermoregulation. Agreement between our results and previous studies on the pygmy phenotype, a putative prototype of forest adaptation, were found, suggesting that a few genetic regions previously described as associated with short stature may be evolving under similar positive selection in Africa and the Americas. In general, convergent evolution was less pervasive than local adaptation in one single continent, suggesting that Africans and Amerindians may have followed different routes to adapt to similar environmental selective pressures.
Resumo:
Because natural selection is likely to act on multiple genes underlying a given phenotypic trait, we study here the potential effect of ongoing and past selection on the genetic diversity of human biological pathways. We first show that genes included in gene sets are generally under stronger selective constraints than other genes and that their evolutionary response is correlated. We then introduce a new procedure to detect selection at the pathway level based on a decomposition of the classical McDonald–Kreitman test extended to multiple genes. This new test, called 2DNS, detects outlier gene sets and takes into account past demographic effects and evolutionary constraints specific to gene sets. Selective forces acting on gene sets can be easily identified by a mere visual inspection of the position of the gene sets relative to their two-dimensional null distribution. We thus find several outlier gene sets that show signals of positive, balancing, or purifying selection but also others showing an ancient relaxation of selective constraints. The principle of the 2DNS test can also be applied to other genomic contrasts. For instance, the comparison of patterns of polymorphisms private to African and non-African populations reveals that most pathways show a higher proportion of nonsynonymous mutations in non-Africans than in Africans, potentially due to different demographic histories and selective pressures.
Resumo:
When considering data from many trials, it is likely that some of them present a markedly different intervention effect or exert an undue influence on the summary results. We develop a forward search algorithm for identifying outlying and influential studies in meta-analysis models. The forward search algorithm starts by fitting the hypothesized model to a small subset of likely outlier-free studies and proceeds by adding studies into the set one-by-one that are determined to be closest to the fitted model of the existing set. As each study is added to the set, plots of estimated parameters and measures of fit are monitored to identify outliers by sharp changes in the forward plots. We apply the proposed outlier detection method to two real data sets; a meta-analysis of 26 studies that examines the effect of writing-to-learn interventions on academic achievement adjusting for three possible effect modifiers, and a meta-analysis of 70 studies that compares a fluoride toothpaste treatment to placebo for preventing dental caries in children. A simple simulated example is used to illustrate the steps of the proposed methodology, and a small-scale simulation study is conducted to evaluate the performance of the proposed method. Copyright © 2016 John Wiley & Sons, Ltd.
Resumo:
Previous studies (e.g., Hamori, 2000; Ho and Tsui, 2003; Fountas et al., 2004) find high volatility persistence of economic growth rates using generalized autoregressive conditional heteroskedasticity (GARCH) specifications. This paper reexamines the Japanese case, using the same approach and showing that this finding of high volatility persistence reflects the Great Moderation, which features a sharp decline in the variance as well as two falls in the mean of the growth rates identified by Bai and Perronâs (1998, 2003) multiple structural change test. Our empirical results provide new evidence. First, excess kurtosis drops substantially or disappears in the GARCH or exponential GARCH model that corrects for an additive outlier. Second, using the outlier-corrected data, the integrated GARCH effect or high volatility persistence remains in the specification once we introduce intercept-shift dummies into the mean equation. Third, the time-varying variance falls sharply, only when we incorporate the break in the variance equation. Fourth, the ARCH in mean model finds no effects of our more correct measure of output volatility on output growth or of output growth on its volatility.