53 resultados para principal component analysis (PCA)
Resumo:
Analyzing functional data often leads to finding common factors, for which functional principal component analysis proves to be a useful tool to summarize and characterize the random variation in a function space. The representation in terms of eigenfunctions is optimal in the sense of L-2 approximation. However, the eigenfunctions are not always directed towards an interesting and interpretable direction in the context of functional data and thus could obscure the underlying structure. To overcome such difficulty, an alternative to functional principal component analysis is proposed that produces directed components which may be more informative and easier to interpret. These structural components are similar to principal components, but are adapted to situations in which the domain of the function may be decomposed into disjoint intervals such that there is effectively independence between intervals and positive correlation within intervals. The approach is demonstrated with synthetic examples as well as real data. Properties for special cases are also studied.
Resumo:
Imaging mass spectrometry (IMS) represents an innovative tool in the cancer research pipeline, which is increasingly being used in clinical and pharmaceutical applications. The unique properties of the technique, especially the amount of data generated, make the handling of data from multiple IMS acquisitions challenging. This work presents a histology-driven IMS approach aiming to identify discriminant lipid signatures from the simultaneous mining of IMS data sets from multiple samples. The feasibility of the developed workflow is evaluated on a set of three human colorectal cancer liver metastasis (CRCLM) tissue sections. Lipid IMS on tissue sections was performed using MALDI-TOF/TOF MS in both negative and positive ionization modes after 1,5-diaminonaphthalene matrix deposition by sublimation. The combination of both positive and negative acquisition results was performed during data mining to simplify the process and interrogate a larger lipidome into a single analysis. To reduce the complexity of the IMS data sets, a sub data set was generated by randomly selecting a fixed number of spectra from a histologically defined region of interest, resulting in a 10-fold data reduction. Principal component analysis confirmed that the molecular selectivity of the regions of interest is maintained after data reduction. Partial least-squares and heat map analyses demonstrated a selective signature of the CRCLM, revealing lipids that are significantly up- and down-regulated in the tumor region. This comprehensive approach is thus of interest for defining disease signatures directly from IMS data sets by the use of combinatory data mining, opening novel routes of investigation for addressing the demands of the clinical setting.
Resumo:
Natural fluctuations in soil microbial communities are poorly documented because of the inherent difficulty to perform a simultaneous analysis of the relative abundances of multiple populations over a long time period. Yet, it is important to understand the magnitudes of community composition variability as a function of natural influences (e.g., temperature, plant growth, or rainfall) because this forms the reference or baseline against which external disturbances (e.g., anthropogenic emissions) can be judged. Second, definition of baseline fluctuations in complex microbial communities may help to understand at which point the systems become unbalanced and cannot return to their original composition. In this paper, we examined the seasonal fluctuations in the bacterial community of an agricultural soil used for regular plant crop production by using terminal restriction fragment length polymorphism profiling (T-RFLP) of the amplified 16S ribosomal ribonucleic acid (rRNA) gene diversity. Cluster and statistical analysis of T-RFLP data showed that soil bacterial communities fluctuated very little during the seasons (similarity indices between 0.835 and 0.997) with insignificant variations in 16S rRNA gene richness and diversity indices. Despite overall insignificant fluctuations, between 8 and 30% of all terminal restriction fragments changed their relative intensity in a significant manner among consecutive time samples. To determine the magnitude of community variations induced by external factors, soil samples were subjected to either inoculation with a pure bacterial culture, addition of the herbicide mecoprop, or addition of nutrients. All treatments resulted in statistically measurable changes of T-RFLP profiles of the communities. Addition of nutrients or bacteria plus mecoprop resulted in bacteria composition, which did not return to the original profile within 14 days. We propose that at less than 70% similarity in T-RFLP, the bacterial communities risk to drift apart to inherently different states.
Resumo:
Evaluation of segmentation methods is a crucial aspect in image processing, especially in the medical imaging field, where small differences between segmented regions in the anatomy can be of paramount importance. Usually, segmentation evaluation is based on a measure that depends on the number of segmented voxels inside and outside of some reference regions that are called gold standards. Although some other measures have been also used, in this work we propose a set of new similarity measures, based on different features, such as the location and intensity values of the misclassified voxels, and the connectivity and the boundaries of the segmented data. Using the multidimensional information provided by these measures, we propose a new evaluation method whose results are visualized applying a Principal Component Analysis of the data, obtaining a simplified graphical method to compare different segmentation results. We have carried out an intensive study using several classic segmentation methods applied to a set of MRI simulated data of the brain with several noise and RF inhomogeneity levels, and also to real data, showing that the new measures proposed here and the results that we have obtained from the multidimensional evaluation, improve the robustness of the evaluation and provides better understanding about the difference between segmentation methods.
Resumo:
Although Leontopodium alpinum is considered to be threatened in many countries, only limited scientific information about its autecology is available. In this study, we aim to define the most important ecological factors which influence the distribution of L. alpinum in the Swiss Alps. These were assessed at the national scale using species distribution models based on topoclimatic predictors and at the community scale using exhaustive plant inventories. The latter were analysed using hierarchical clustering and principal component analysis, and the results were interpreted using ecological indicator values. L. alpinum was found almost exclusively on base-rich bedrocks (limestone and ultramaphic rocks). The species distribution models showed that the available moisture (dry regions, mostly in the Inner Alps), elevation (mostly above 2000 m.a.s.l.) and slope (mostly >30°) were the most important predictors. The relevés showed that L. alpinum is present in a wide range of plant communities, all subalpine-alpine open grasslands, with a low grass cover. As a light-demanding and short species, L. alpinum requires light at ground level; hence, it can only grow in open, nutrient-poor grasslands. These conditions are met in dry conditions (dry, summer-warm climate, rocky and draining soil, south-facing aspect and/or steep slope), at high elevations, on oligotrophic soils and/or on windy ridges. Base-rich soils appear to also be essential, although it is still unclear if this corresponds to physiological or ecological (lower competition) requirements.
Resumo:
The fatty acids from cocoa butters of different origins, varieties, and suppliers and a number of cocoa butter equivalents (Illexao 30-61, Illexao 30-71, Illexao 30-96, Choclin, Coberine, Chocosine-Illipe, Chocosine-Shea, Shokao, Akomax, Akonord, and Ertina) were investigated by bulk stable carbon isotope analysis and compound specific isotope analysis. The interpretation is based on principal component analysis combining the fatty acid concentrations and the bulk and molecular isotopic data. The scatterplot of the two first principal components allowed detection of the addition of vegetable fats to cocoa butters. Enrichment in heavy carbon isotope (C-13) of the bulk cocoa butter and of the individual fatty acids is related to mixing with other vegetable fats and possibly to thermally or oxidatively induced degradation during processing (e.g., drying and roasting of the cocoa beans or deodorization of the pressed fat) or storage. The feasibility of the analytical approach for authenticity assessment is discussed.
Resumo:
Understanding the genetic structure of human populations is of fundamental interest to medical, forensic and anthropological sciences. Advances in high-throughput genotyping technology have markedly improved our understanding of global patterns of human genetic variation and suggest the potential to use large samples to uncover variation among closely spaced populations. Here we characterize genetic variation in a sample of 3,000 European individuals genotyped at over half a million variable DNA sites in the human genome. Despite low average levels of genetic differentiation among Europeans, we find a close correspondence between genetic and geographic distances; indeed, a geographical map of Europe arises naturally as an efficient two-dimensional summary of genetic variation in Europeans. The results emphasize that when mapping the genetic basis of a disease phenotype, spurious associations can arise if genetic structure is not properly accounted for. In addition, the results are relevant to the prospects of genetic ancestry testing; an individual's DNA can be used to infer their geographic origin with surprising accuracy-often to within a few hundred kilometres.
Resumo:
Using one male-inherited and eight biparentally inherited microsatellite markers, we investigate the population genetic structure of the Valais chromosome race of the common shrew (Sorex araneus) in the Central Alps of Europe. Unexpectedly, the Y-chromosome microsatellite suggests nearly complete absence of male gene flow among populations from the St-Bernard and Simplon regions (Switzerland). Autosomal markers also show significant genetic structuring among these two geographical areas. Isolation by distance is significant and possible barriers to gene flow exist in the study area. Two different approaches are used to better understand the geographical patterns and the causes of this structuring. Using a principal component analysis for which testing procedure exists, and partial Mantel tests, we show that the St-Bernard pass does not represent a significant barrier to gene flow although it culminates at 2469 m, close to the highest altitudinal record for this species. Similar results are found for the Simplon pass, indicating that both passes represented potential postglacial recolonization routes into Switzerland from Italian refugia after the last Pleistocene glaciations. In contrast with the weak effect of these mountain passes, the Rhône valley lowlands significantly reduce gene flow in this species. Natural obstacles (the large Rhône river) and unsuitable habitats (dry slopes) are both present in the valley. Moreover, anthropogenic changes to landscape structures are likely to have strongly reduced available habitats for this shrew in the lowlands, thereby promoting genetic differentiation of populations found on opposite sides of the Rhône valley.
Resumo:
Changes in gene expression are thought to underlie many of the phenotypic differences between species. However, large-scale analyses of gene expression evolution were until recently prevented by technological limitations. Here we report the sequencing of polyadenylated RNA from six organs across ten species that represent all major mammalian lineages (placentals, marsupials and monotremes) and birds (the evolutionary outgroup), with the goal of understanding the dynamics of mammalian transcriptome evolution. We show that the rate of gene expression evolution varies among organs, lineages and chromosomes, owing to differences in selective pressures: transcriptome change was slow in nervous tissues and rapid in testes, slower in rodents than in apes and monotremes, and rapid for the X chromosome right after its formation. Although gene expression evolution in mammals was strongly shaped by purifying selection, we identify numerous potentially selectively driven expression switches, which occurred at different rates across lineages and tissues and which probably contributed to the specific organ biology of various mammals.
Resumo:
Functionally relevant large scale brain dynamics operates within the framework imposed by anatomical connectivity and time delays due to finite transmission speeds. To gain insight on the reliability and comparability of large scale brain network simulations, we investigate the effects of variations in the anatomical connectivity. Two different sets of detailed global connectivity structures are explored, the first extracted from the CoCoMac database and rescaled to the spatial extent of the human brain, the second derived from white-matter tractography applied to diffusion spectrum imaging (DSI) for a human subject. We use the combination of graph theoretical measures of the connection matrices and numerical simulations to explicate the importance of both connectivity strength and delays in shaping dynamic behaviour. Our results demonstrate that the brain dynamics derived from the CoCoMac database are more complex and biologically more realistic than the one based on the DSI database. We propose that the reason for this difference is the absence of directed weights in the DSI connectivity matrix.
Resumo:
Our current knowledge of the general factor requirement in transcription by the three mammalian RNA polymerases is based on a small number of model promoters. Here, we present a comprehensive chromatin immunoprecipitation (ChIP)-on-chip analysis for 28 transcription factors on a large set of known and novel TATA-binding protein (TBP)-binding sites experimentally identified via ChIP cloning. A large fraction of identified TBP-binding sites is located in introns or lacks a gene/mRNA annotation and is found to direct transcription. Integrated analysis of the ChIP-on-chip data and functional studies revealed that TAF12 hitherto regarded as RNA polymerase II (RNAP II)-specific was found to be also involved in RNAP I transcription. Distinct profiles for general transcription factors and TAF-containing complexes were uncovered for RNAP II promoters located in CpG and non-CpG islands suggesting distinct transcription initiation pathways. Our study broadens the spectrum of general transcription factor function and uncovers a plethora of novel, functional TBP-binding sites in the human genome.
Resumo:
BACKGROUND: The criteria for choosing relevant cell lines among a vast panel of available intestinal-derived lines exhibiting a wide range of functional properties are still ill-defined. The objective of this study was, therefore, to establish objective criteria for choosing relevant cell lines to assess their appropriateness as tumor models as well as for drug absorption studies. RESULTS: We made use of publicly available expression signatures and cell based functional assays to delineate differences between various intestinal colon carcinoma cell lines and normal intestinal epithelium. We have compared a panel of intestinal cell lines with patient-derived normal and tumor epithelium and classified them according to traits relating to oncogenic pathway activity, epithelial-mesenchymal transition (EMT) and stemness, migratory properties, proliferative activity, transporter expression profiles and chemosensitivity. For example, SW480 represent an EMT-high, migratory phenotype and scored highest in terms of signatures associated to worse overall survival and higher risk of recurrence based on patient derived databases. On the other hand, differentiated HT29 and T84 cells showed gene expression patterns closest to tumor bulk derived cells. Regarding drug absorption, we confirmed that differentiated Caco-2 cells are the model of choice for active uptake studies in the small intestine. Regarding chemosensitivity we were unable to confirm a recently proposed association of chemo-resistance with EMT traits. However, a novel signature was identified through mining of NCI60 GI50 values that allowed to rank the panel of intestinal cell lines according to their drug responsiveness to commonly used chemotherapeutics. CONCLUSIONS: This study presents a straightforward strategy to exploit publicly available gene expression data to guide the choice of cell-based models. While this approach does not overcome the major limitations of such models, introducing a rank order of selected features may allow selecting model cell lines that are more adapted and pertinent to the addressed biological question.
Resumo:
Kinematic functional evaluation with body-worn sensors provides discriminative and responsive scores after shoulder surgery, but the optimal movements' combination has not yet been scientifically investigated. The aim of this study was the development of a simplified shoulder function kinematic score including only essential movements. The P Score, a seven-movement kinematic score developed on 31 healthy participants and 35 patients before surgery and at 3, 6 and 12 months after shoulder surgery, served as a reference.Principal component analysis and multiple regression were used to create simplified scoring models. The candidate models were compared to the reference score. ROC curve for shoulder pathology detection and correlations with clinical questionnaires were calculated.The B-B Score (hand to the Back and hand upwards as to change a Bulb) showed no difference to the P Score in time*score interaction (P > .05) and its relation with the reference score was highly linear (R(2) > .97). Absolute value of correlations with clinical questionnaires ranged from 0.51 to 0.77. Sensitivity was 97% and specificity 94%.The B-B and reference scores are equivalent for the measurement of group responses. The validated simplified scoring model presents practical advantages that facilitate the objective evaluation of shoulder function in clinical practice.
Resumo:
Mismatch negativity (MMN) overlaps with other auditory event-related potential (ERP) components. We examined the ERPs of 50 9- to 11-year-old children for vowels /i/, /y/ and equivalent complex tones. The goal was to separate MMN from obligatory ERP components using principal component analysis and equal probability control condition. In addition to the contrast of the deviant minus standard response, we employed the contrast of the deviant minus control response, to see whether the obligatory processing contributes to MMN in children. When looking for differences in speech deviant minus standard contrast, MMN starts around 112 ms. However, when both contrasts are examined, MMN emerges for speech at 160 ms whereas for nonspeech MMN is observed at 112 ms regardless of contrast. We argue that this discriminative response to speech stimuli at 112 ms is obligatory in nature rather than reflecting change detection processing.
Resumo:
A multivariate morphometric study of the Greater white-toothed shrew (C. russula) throughout its Palearctic range was carried out to search for patterns of geographic variation within the species boundary. Burnaby's and multiple group principal component analysis allowed the adjustment of raw data with respect to within-sample allometric variation. Multivariate 'size-free' results show a stepped dine with the phenotypical trait reduction and shape change from the eastern to the western Maghreb. Pleistocene fossil mandibles proved to have low phenetic distances with eastern populations (Tunisia, east Algeria) and it is argued that their character set is the primitive condition. The ancestral Mid-Pleistocene shrews lived in a relatively more humid climate. Gee-climatic changes in the north African range during the Quaternary provoked phenetic variation of C. russula and, it can be argued, evolution of the modern western C.r. yebalensis. A historical process can thus be assumed as the main cause of this categorical variation, by segmentation of the species range due to gee-climatic events. Morphometric discontinuity within the C. russula Maghreb range is shown to be congruent with karyological and biochemical studies. Moroccan and Tunisian shrews differ, for example, in NFa chromosomes and electrophoretical traits. A stasipatric process should be invoked to explain categorical variation in the Maghreb range. Colonization and divergence of insular populations results in more or less differentiated geographic races. The populations of Ibiza and Pantelleria are close to the species threshold (Nei's D greater than or equal to 0.1). The process of speciation undergone by the Greater white-toothed shrew results in a complex pattern of geographic variation, including both allopatric and non-allopatric modes.