132 resultados para DATA SET
Resumo:
Many eukaryote organisms are polyploid. However, despite their importance, evolutionary inference of polyploid origins and modes of inheritance has been limited by a need for analyses of allele segregation at multiple loci using crosses. The increasing availability of sequence data for nonmodel species now allows the application of established approaches for the analysis of genomic data in polyploids. Here, we ask whether approximate Bayesian computation (ABC), applied to realistic traditional and next-generation sequence data, allows correct inference of the evolutionary and demographic history of polyploids. Using simulations, we evaluate the robustness of evolutionary inference by ABC for tetraploid species as a function of the number of individuals and loci sampled, and the presence or absence of an outgroup. We find that ABC adequately retrieves the recent evolutionary history of polyploid species on the basis of both old and new sequencing technologies. The application of ABC to sequence data from diploid and polyploid species of the plant genus Capsella confirms its utility. Our analysis strongly supports an allopolyploid origin of C. bursa-pastoris about 80 000 years ago. This conclusion runs contrary to previous findings based on the same data set but using an alternative approach and is in agreement with recent findings based on whole-genome sequencing. Our results indicate that ABC is a promising and powerful method for revealing the evolution of polyploid species, without the need to attribute alleles to a homeologous chromosome pair. The approach can readily be extended to more complex scenarios involving higher ploidy levels.
Resumo:
The paper deals with the development and application of the methodology for automatic mapping of pollution/contamination data. General Regression Neural Network (GRNN) is considered in detail and is proposed as an efficient tool to solve this problem. The automatic tuning of isotropic and an anisotropic GRNN model using cross-validation procedure is presented. Results are compared with k-nearest-neighbours interpolation algorithm using independent validation data set. Quality of mapping is controlled by the analysis of raw data and the residuals using variography. Maps of probabilities of exceeding a given decision level and ?thick? isoline visualization of the uncertainties are presented as examples of decision-oriented mapping. Real case study is based on mapping of radioactively contaminated territories.
Resumo:
Restriction site-associated DNA sequencing (RADseq) provides researchers with the ability to record genetic polymorphism across thousands of loci for nonmodel organisms, potentially revolutionizing the field of molecular ecology. However, as with other genotyping methods, RADseq is prone to a number of sources of error that may have consequential effects for population genetic inferences, and these have received only limited attention in terms of the estimation and reporting of genotyping error rates. Here we use individual sample replicates, under the expectation of identical genotypes, to quantify genotyping error in the absence of a reference genome. We then use sample replicates to (i) optimize de novo assembly parameters within the program Stacks, by minimizing error and maximizing the retrieval of informative loci; and (ii) quantify error rates for loci, alleles and single-nucleotide polymorphisms. As an empirical example, we use a double-digest RAD data set of a nonmodel plant species, Berberis alpina, collected from high-altitude mountains in Mexico.
Resumo:
Molecular monitoring of BCR/ABL transcripts by real time quantitative reverse transcription PCR (qRT-PCR) is an essential technique for clinical management of patients with BCR/ABL-positive CML and ALL. Though quantitative BCR/ABL assays are performed in hundreds of laboratories worldwide, results among these laboratories cannot be reliably compared due to heterogeneity in test methods, data analysis, reporting, and lack of quantitative standards. Recent efforts towards standardization have been limited in scope. Aliquots of RNA were sent to clinical test centers worldwide in order to evaluate methods and reporting for e1a2, b2a2, and b3a2 transcript levels using their own qRT-PCR assays. Total RNA was isolated from tissue culture cells that expressed each of the different BCR/ABL transcripts. Serial log dilutions were prepared, ranging from 100 to 10-5, in RNA isolated from HL60 cells. Laboratories performed 5 independent qRT-PCR reactions for each sample type at each dilution. In addition, 15 qRT-PCR reactions of the 10-3 b3a2 RNA dilution were run to assess reproducibility within and between laboratories. Participants were asked to run the samples following their standard protocols and to report cycle threshold (Ct), quantitative values for BCR/ABL and housekeeping genes, and ratios of BCR/ABL to housekeeping genes for each sample RNA. Thirty-seven (n=37) participants have submitted qRT-PCR results for analysis (36, 37, and 34 labs generated data for b2a2, b3a2, and e1a2, respectively). The limit of detection for this study was defined as the lowest dilution that a Ct value could be detected for all 5 replicates. For b2a2, 15, 16, 4, and 1 lab(s) showed a limit of detection at the 10-5, 10-4, 10-3, and 10-2 dilutions, respectively. For b3a2, 20, 13, and 4 labs showed a limit of detection at the 10-5, 10-4, and 10-3 dilutions, respectively. For e1a2, 10, 21, 2, and 1 lab(s) showed a limit of detection at the 10-5, 10-4, 10-3, and 10-2 dilutions, respectively. Log %BCR/ABL ratio values provided a method for comparing results between the different laboratories for each BCR/ABL dilution series. Linear regression analysis revealed concordance among the majority of participant data over the 10-1 to 10-4 dilutions. The overall slope values showed comparable results among the majority of b2a2 (mean=0.939; median=0.9627; range (0.399 - 1.1872)), b3a2 (mean=0.925; median=0.922; range (0.625 - 1.140)), and e1a2 (mean=0.897; median=0.909; range (0.5174 - 1.138)) laboratory results (Fig. 1-3)). Thirty-four (n=34) out of the 37 laboratories reported Ct values for all 15 replicates and only those with a complete data set were included in the inter-lab calculations. Eleven laboratories either did not report their copy number data or used other reporting units such as nanograms or cell numbers; therefore, only 26 laboratories were included in the overall analysis of copy numbers. The median copy number was 348.4, with a range from 15.6 to 547,000 copies (approximately a 4.5 log difference); the median intra-lab %CV was 19.2% with a range from 4.2% to 82.6%. While our international performance evaluation using serially diluted RNA samples has reinforced the fact that heterogeneity exists among clinical laboratories, it has also demonstrated that performance within a laboratory is overall very consistent. Accordingly, the availability of defined BCR/ABL RNAs may facilitate the validation of all phases of quantitative BCR/ABL analysis and may be extremely useful as a tool for monitoring assay performance. Ongoing analyses of these materials, along with the development of additional control materials, may solidify consensus around their application in routine laboratory testing and possible integration in worldwide efforts to standardize quantitative BCR/ABL testing.
Resumo:
Rubisco is responsible for the fixation of CO2 into organic compounds through photosynthesis and thus has a great agronomic importance. It is well established that this enzyme suffers from a slow catalysis, and its low specificity results into photorespiration, which is considered as an energy waste for the plant. However, natural variations exist, and some Rubisco lineages, such as in C4 plants, exhibit higher catalytic efficiencies coupled to lower specificities. These C4 kinetics could have evolved as an adaptation to the higher CO2 concentration present in C4 photosynthetic cells. In this study, using phylogenetic analyses on a large data set of C3 and C4 monocots, we showed that the rbcL gene, which encodes the large subunit of Rubisco, evolved under positive selection in independent C4 lineages. This confirms that selective pressures on Rubisco have been switched in C4 plants by the high CO2 environment prevailing in their photosynthetic cells. Eight rbcL codons evolving under positive selection in C4 clades were involved in parallel changes among the 23 independent monocot C4 lineages included in this study. These amino acids are potentially responsible for the C4 kinetics, and their identification opens new roads for human-directed Rubisco engineering. The introgression of C4-like high-efficiency Rubisco would strongly enhance C3 crop yields in the future CO2-enriched atmosphere.
Resumo:
We have used massively parallel signature sequencing (MPSS) to sample the transcriptomes of 32 normal human tissues to an unprecedented depth, thus documenting the patterns of expression of almost 20,000 genes with high sensitivity and specificity. The data confirm the widely held belief that differences in gene expression between cell and tissue types are largely determined by transcripts derived from a limited number of tissue-specific genes, rather than by combinations of more promiscuously expressed genes. Expression of a little more than half of all known human genes seems to account for both the common requirements and the specific functions of the tissues sampled. A classification of tissues based on patterns of gene expression largely reproduces classifications based on anatomical and biochemical properties. The unbiased sampling of the human transcriptome achieved by MPSS supports the idea that most human genes have been mapped, if not functionally characterized. This data set should prove useful for the identification of tissue-specific genes, for the study of global changes induced by pathological conditions, and for the definition of a minimal set of genes necessary for basic cell maintenance. The data are available on the Web at http://mpss.licr.org and http://sgb.lynxgen.com.
Resumo:
The fire ant Solenopsis invicta and its close relatives display an important social polymorphism involving differences in colony queen number. Colonies are headed by either a single reproductive queen (monogyne form) or multiple queens (polygyne form). This variation in social organization is associated with variation at the gene Gp-9, with monogyne colonies harboring only B-like allelic variants and polygyne colonies always containing b-like variants as well. We describe naturally occurring variation at Gp-9 in fire ants based on 185 full-length sequences, 136 of which were obtained from S. invicta collected over much of its native range. While there is little overall differentiation between most of the numerous alleles observed, a surprising amount is found in the coding regions of the gene, with such substitutions usually causing amino acid replacements. This elevated coding-region variation may result from a lack of negative selection acting to constrain amino acid replacements over much of the protein, different mutation rates or biases in coding and non-coding sequences, negative selection acting with greater strength on non-coding than coding regions, and/or positive selection acting on the protein. Formal selection analyses provide evidence that the latter force played an important role in the basal b-like lineages coincident with the emergence of polygyny. While our data set reveals considerable paraphyly and polyphyly of S. invicta sequences with respect to those of other fire ant species, the b-like alleles of the socially polymorphic species are monophyletic. An expanded analysis of colonies containing alleles of this clade confirmed the invariant link between their presence and expression of polygyny. Finally, our discovery of several unique alleles bearing various combinations of b-like and B-like codons allows us to conclude that no single b-like residue is completely predictive of polygyne behavior and, thus, potentially causally involved in its expression. Rather, all three typical b-like residues appear to be necessary.
Disentangling the effects of key innovations on the diversification of Bromelioideae (bromeliaceae).
Resumo:
The evolution of key innovations, novel traits that promote diversification, is often seen as major driver for the unequal distribution of species richness within the tree of life. In this study, we aim to determine the factors underlying the extraordinary radiation of the subfamily Bromelioideae, one of the most diverse clades among the neotropical plant family Bromeliaceae. Based on an extended molecular phylogenetic data set, we examine the effect of two putative key innovations, that is, the Crassulacean acid metabolism (CAM) and the water-impounding tank, on speciation and extinction rates. To this aim, we develop a novel Bayesian implementation of the phylogenetic comparative method, binary state speciation and extinction, which enables hypotheses testing by Bayes factors and accommodates the uncertainty on model selection by Bayesian model averaging. Both CAM and tank habit were found to correlate with increased net diversification, thus fulfilling the criteria for key innovations. Our analyses further revealed that CAM photosynthesis is correlated with a twofold increase in speciation rate, whereas the evolution of the tank had primarily an effect on extinction rates that were found five times lower in tank-forming lineages compared to tank-less clades. These differences are discussed in the light of biogeography, ecology, and past climate change.
Resumo:
Petrographic, mineralogical, and stable isotopes (delta C-13, delta O-18 values) compositions were used to characterise marbles and sedimentary carbonate rocks from central Morocco, which are considered to be a likely source of ornamental and building material from Roman time to the present day. This new data set was used in the frame of an archaeometric provenance study on Roman artefacts from the town of Thamusida (Kenitra, north Morocco), to assess the potential employment of these rocks for the manufacture of the archaeological materials. A representative set of samples from marbles and other carbonate rocks (limestone, dolostone) were collected in several quarries and outcrops in the Moroccan Meseta, in a region extending from the Meknes-Khenifra alignment to the Atlantic Ocean. All the samples were studied using a petrographic, mineralogical and geochemical methods. The petrographic and minerological investigations (optical microscopy, electron microscopy, X-ray diffraction) allowed to group the carbonate rocks in limestones, foliated limestone, diagenetic breccias and dolostone. The limestones could be further grouped as mudstones, wackestones-packstones, crinoid grainstones, oolitic grainstone and floatstones. Textural differences allowed to define marbles varieties. The stable carbon and oxygen isotope composition proved to be quite useful in the discrimination of marble sources, with apparently less discriminatory potential for carbonate rocks.
Resumo:
OBJECTIVE: To elucidate the diagnostic accuracy of granulocyte colony-stimulating factor (G-CSF), interleukin-8 (IL-8), and interleukin-1 receptor antagonist (IL-1ra) in identifying patients with sepsis among critically ill pediatric patients with suspected infection. DESIGN AND SETTING: Nested case-control study in a multidisciplinary neonatal and pediatric intensive care unit (PICU) PATIENTS: PICU patients during a 12-month period with suspected infection, and plasma available from the time of clinical suspicion (254 episodes, 190 patients). MEASUREMENTS AND RESULTS: Plasma levels of G-CSF, IL-8, and IL-1ra. Episodes classified on the basis of clinical and bacteriological findings into: culture-confirmed sepsis, probable sepsis, localized infection, viral infection, and no infection. Plasma levels were significantly higher in episodes of culture-confirmed sepsis than in episodes with ruled-out infection. The area under the receiver operating characteristic curve was higher for IL-8 and G-CSF than for IL-1ra. Combining IL-8 and G-CSF improved the diagnostic performance, particularly as to the detection of Gram-negative sepsis. Sensitivity was low (<50%) in detecting Staphylococcus epidermidis bacteremia or localized infections. CONCLUSIONS: In this heterogeneous population of critically ill children with suspected infection, a model combining plasma levels of IL-8 and G-CSF identified patients with sepsis. Negative results do not rule out S. epidermidis bacteremia or locally confined infectious processes. The model requires validation in an independent data-set.
Resumo:
In many fields, the spatial clustering of sampled data points has many consequences. Therefore, several indices have been proposed to assess the level of clustering affecting datasets (e.g. the Morisita index, Ripley's Kfunction and Rényi's generalized entropy). The classical Morisita index measures how many times it is more likely to select two measurement points from the same quadrats (the data set is covered by a regular grid of changing size) than it would be in the case of a random distribution generated from a Poisson process. The multipoint version (k-Morisita) takes into account k points with k >= 2. The present research deals with a new development of the k-Morisita index for (1) monitoring network characterization and for (2) detection of patterns in monitored phenomena. From a theoretical perspective, a connection between the k-Morisita index and multifractality has also been found and highlighted on a mathematical multifractal set.
Multimodel inference and multimodel averaging in empirical modeling of occupational exposure levels.
Resumo:
Empirical modeling of exposure levels has been popular for identifying exposure determinants in occupational hygiene. Traditional data-driven methods used to choose a model on which to base inferences have typically not accounted for the uncertainty linked to the process of selecting the final model. Several new approaches propose making statistical inferences from a set of plausible models rather than from a single model regarded as 'best'. This paper introduces the multimodel averaging approach described in the monograph by Burnham and Anderson. In their approach, a set of plausible models are defined a priori by taking into account the sample size and previous knowledge of variables influent on exposure levels. The Akaike information criterion is then calculated to evaluate the relative support of the data for each model, expressed as Akaike weight, to be interpreted as the probability of the model being the best approximating model given the model set. The model weights can then be used to rank models, quantify the evidence favoring one over another, perform multimodel prediction, estimate the relative influence of the potential predictors and estimate multimodel-averaged effects of determinants. The whole approach is illustrated with the analysis of a data set of 1500 volatile organic compound exposure levels collected by the Institute for work and health (Lausanne, Switzerland) over 20 years, each concentration having been divided by the relevant Swiss occupational exposure limit and log-transformed before analysis. Multimodel inference represents a promising procedure for modeling exposure levels that incorporates the notion that several models can be supported by the data and permits to evaluate to a certain extent model selection uncertainty, which is seldom mentioned in current practice.
Resumo:
Intensification of farming over the past 50 years has homogenised the landscape structure and contributed to the decline of bird populations in Europe. To better target the conservation of the Barn Owl Tyto alba, we assessed the influence of the landscape structure on breeding performance in western Switzerland. The analyses considered a 23-year data set of breeding parameters collected in an area dominated by intensive agriculture. Using a Geographic Information System approach, landscape characteristics were described around 194 nest sites. Our analyses showed that nest-box occupancy, laying date, clutch and brood size, egg volume and probability of producing a second annual clutch were not significantly associated with any of the eight principal landscape variables (agricultural land, woodland, urban area, hedgerows, cereals, sugar beet, maize and meadow). Nevertheless, the probability that a breeding pair occupied a nest-box decreased the more roads there were surrounding the nest-box. The absence of strong associations between habitat features and breeding parameters suggests that prey availability may be relatively similar between the different breeding sites. In our study area Barn Owls can always find suitable foraging habitats around most nest-boxes.
Resumo:
Motivation. The study of human brain development in itsearly stage is today possible thanks to in vivo fetalmagnetic resonance imaging (MRI) techniques. Aquantitative analysis of fetal cortical surfacerepresents a new approach which can be used as a markerof the cerebral maturation (as gyration) and also forstudying central nervous system pathologies [1]. However,this quantitative approach is a major challenge forseveral reasons. First, movement of the fetus inside theamniotic cavity requires very fast MRI sequences tominimize motion artifacts, resulting in a poor spatialresolution and/or lower SNR. Second, due to the ongoingmyelination and cortical maturation, the appearance ofthe developing brain differs very much from thehomogenous tissue types found in adults. Third, due tolow resolution, fetal MR images considerably suffer ofpartial volume (PV) effect, sometimes in large areas.Today extensive efforts are made to deal with thereconstruction of high resolution 3D fetal volumes[2,3,4] to cope with intra-volume motion and low SNR.However, few studies exist related to the automatedsegmentation of MR fetal imaging. [5] and [6] work on thesegmentation of specific areas of the fetal brain such asposterior fossa, brainstem or germinal matrix. Firstattempt for automated brain tissue segmentation has beenpresented in [7] and in our previous work [8]. Bothmethods apply the Expectation-Maximization Markov RandomField (EM-MRF) framework but contrary to [7] we do notneed from any anatomical atlas prior. Data set &Methods. Prenatal MR imaging was performed with a 1-Tsystem (GE Medical Systems, Milwaukee) using single shotfast spin echo (ssFSE) sequences (TR 7000 ms, TE 180 ms,FOV 40 x 40 cm, slice thickness 5.4mm, in plane spatialresolution 1.09mm). Each fetus has 6 axial volumes(around 15 slices per volume), each of them acquired inabout 1 min. Each volume is shifted by 1 mm with respectto the previous one. Gestational age (GA) ranges from 29to 32 weeks. Mother is under sedation. Each volume ismanually segmented to extract fetal brain fromsurrounding maternal tissues. Then, in-homogeneityintensity correction is performed using [9] and linearintensity normalization is performed to have intensityvalues that range from 0 to 255. Note that due tointra-tissue variability of developing brain someintensity variability still remains. For each fetus, ahigh spatial resolution image of isotropic voxel size of1.09 mm is created applying [2] and using B-splines forthe scattered data interpolation [10] (see Fig. 1). Then,basal ganglia (BS) segmentation is performed on thissuper reconstructed volume. Active contour framework witha Level Set (LS) implementation is used. Our LS follows aslightly different formulation from well-known Chan-Vese[11] formulation. In our case, the LS evolves forcing themean of the inside of the curve to be the mean intensityof basal ganglia. Moreover, we add local spatial priorthrough a probabilistic map created by fitting anellipsoid onto the basal ganglia region. Some userinteraction is needed to set the mean intensity of BG(green dots in Fig. 2) and the initial fitting points forthe probabilistic prior map (blue points in Fig. 2). Oncebasal ganglia are removed from the image, brain tissuesegmentation is performed as described in [8]. Results.The case study presented here has 29 weeks of GA. Thehigh resolution reconstructed volume is presented in Fig.1. The steps of BG segmentation are shown in Fig. 2.Overlap in comparison with manual segmentation isquantified by the Dice similarity index (DSI) equal to0.829 (values above 0.7 are considered a very goodagreement). Such BG segmentation has been applied on 3other subjects ranging for 29 to 32 GA and the DSI hasbeen of 0.856, 0.794 and 0.785. Our segmentation of theinner (red and blue contours) and outer cortical surface(green contour) is presented in Fig. 3. Finally, torefine the results we include our WM segmentation in theFreesurfer software [12] and some manual corrections toobtain Fig.4. Discussion. Precise cortical surfaceextraction of fetal brain is needed for quantitativestudies of early human brain development. Our workcombines the well known statistical classificationframework with the active contour segmentation forcentral gray mater extraction. A main advantage of thepresented procedure for fetal brain surface extraction isthat we do not include any spatial prior coming fromanatomical atlases. The results presented here arepreliminary but promising. Our efforts are now in testingsuch approach on a wider range of gestational ages thatwe will include in the final version of this work andstudying as well its generalization to different scannersand different type of MRI sequences. References. [1]Guibaud, Prenatal Diagnosis 29(4) (2009). [2] Rousseau,Acad. Rad. 13(9), 2006, [3] Jiang, IEEE TMI 2007. [4]Warfield IADB, MICCAI 2009. [5] Claude, IEEE Trans. Bio.Eng. 51(4) (2004). [6] Habas, MICCAI (Pt. 1) 2008. [7]Bertelsen, ISMRM 2009 [8] Bach Cuadra, IADB, MICCAI 2009.[9] Styner, IEEE TMI 19(39 (2000). [10] Lee, IEEE Trans.Visual. And Comp. Graph. 3(3), 1997, [11] Chan, IEEETrans. Img. Proc, 10(2), 2001 [12] Freesurfer,http://surfer.nmr.mgh.harvard.edu.
Resumo:
Restriction site-associated DNA sequencing (RADseq) provides researchers with the ability to record genetic polymorphism across thousands of loci for nonmodel organisms, potentially revolutionizing the field of molecular ecology. However, as with other genotyping methods, RADseq is prone to a number of sources of error that may have consequential effects for population genetic inferences, and these have received only limited attention in terms of the estimation and reporting of genotyping error rates. Here we use individual sample replicates, under the expectation of identical genotypes, to quantify genotyping error in the absence of a reference genome. We then use sample replicates to (i) optimize de novo assembly parameters within the program Stacks, by minimizing error and maximizing the retrieval of informative loci; and (ii) quantify error rates for loci, alleles and single-nucleotide polymorphisms. As an empirical example, we use a double-digest RAD data set of a nonmodel plant species, Berberis alpina, collected from high-altitude mountains in Mexico.