929 resultados para Principal component analysis discriminant analysis
Resumo:
A biplot, which is the multivariate generalization of the two-variable scatterplot, can be used to visualize the results of many multivariate techniques, especially those that are based on the singular value decomposition. We consider data sets consisting of continuous-scale measurements, their fuzzy coding and the biplots that visualize them, using a fuzzy version of multiple correspondence analysis. Of special interest is the way quality of fit of the biplot is measured, since it is well-known that regular (i.e., crisp) multiple correspondence analysis seriously under-estimates this measure. We show how the results of fuzzy multiple correspondence analysis can be defuzzified to obtain estimated values of the original data, and prove that this implies an orthogonal decomposition of variance. This permits a measure of fit to be calculated in the familiar form of a percentage of explained variance, which is directly comparable to the corresponding fit measure used in principal component analysis of the original data. The approach is motivated initially by its application to a simulated data set, showing how the fuzzy approach can lead to diagnosing nonlinear relationships, and finally it is applied to a real set of meteorological data.
Resumo:
The singular value decomposition and its interpretation as alinear biplot has proved to be a powerful tool for analysing many formsof multivariate data. Here we adapt biplot methodology to the specifficcase of compositional data consisting of positive vectors each of whichis constrained to have unit sum. These relative variation biplots haveproperties relating to special features of compositional data: the studyof ratios, subcompositions and models of compositional relationships. Themethodology is demonstrated on a data set consisting of six-part colourcompositions in 22 abstract paintings, showing how the singular valuedecomposition can achieve an accurate biplot of the colour ratios and howpossible models interrelating the colours can be diagnosed.
Resumo:
We construct a weighted Euclidean distance that approximates any distance or dissimilarity measure between individuals that is based on a rectangular cases-by-variables data matrix. In contrast to regular multidimensional scaling methods for dissimilarity data, the method leads to biplots of individuals and variables while preserving all the good properties of dimension-reduction methods that are based on the singular-value decomposition. The main benefits are the decomposition of variance into components along principal axes, which provide the numerical diagnostics known as contributions, and the estimation of nonnegative weights for each variable. The idea is inspired by the distance functions used in correspondence analysis and in principal component analysis of standardized data, where the normalizations inherent in the distances can be considered as differential weighting of the variables. In weighted Euclidean biplots we allow these weights to be unknown parameters, which are estimated from the data to maximize the fit to the chosen distances or dissimilarities. These weights are estimated using a majorization algorithm. Once this extra weight-estimation step is accomplished, the procedure follows the classical path in decomposing the matrix and displaying its rows and columns in biplots.
Resumo:
Changes in gene expression are thought to underlie many of the phenotypic differences between species. However, large-scale analyses of gene expression evolution were until recently prevented by technological limitations. Here we report the sequencing of polyadenylated RNA from six organs across ten species that represent all major mammalian lineages (placentals, marsupials and monotremes) and birds (the evolutionary outgroup), with the goal of understanding the dynamics of mammalian transcriptome evolution. We show that the rate of gene expression evolution varies among organs, lineages and chromosomes, owing to differences in selective pressures: transcriptome change was slow in nervous tissues and rapid in testes, slower in rodents than in apes and monotremes, and rapid for the X chromosome right after its formation. Although gene expression evolution in mammals was strongly shaped by purifying selection, we identify numerous potentially selectively driven expression switches, which occurred at different rates across lineages and tissues and which probably contributed to the specific organ biology of various mammals.
Resumo:
Functionally relevant large scale brain dynamics operates within the framework imposed by anatomical connectivity and time delays due to finite transmission speeds. To gain insight on the reliability and comparability of large scale brain network simulations, we investigate the effects of variations in the anatomical connectivity. Two different sets of detailed global connectivity structures are explored, the first extracted from the CoCoMac database and rescaled to the spatial extent of the human brain, the second derived from white-matter tractography applied to diffusion spectrum imaging (DSI) for a human subject. We use the combination of graph theoretical measures of the connection matrices and numerical simulations to explicate the importance of both connectivity strength and delays in shaping dynamic behaviour. Our results demonstrate that the brain dynamics derived from the CoCoMac database are more complex and biologically more realistic than the one based on the DSI database. We propose that the reason for this difference is the absence of directed weights in the DSI connectivity matrix.
Resumo:
Our current knowledge of the general factor requirement in transcription by the three mammalian RNA polymerases is based on a small number of model promoters. Here, we present a comprehensive chromatin immunoprecipitation (ChIP)-on-chip analysis for 28 transcription factors on a large set of known and novel TATA-binding protein (TBP)-binding sites experimentally identified via ChIP cloning. A large fraction of identified TBP-binding sites is located in introns or lacks a gene/mRNA annotation and is found to direct transcription. Integrated analysis of the ChIP-on-chip data and functional studies revealed that TAF12 hitherto regarded as RNA polymerase II (RNAP II)-specific was found to be also involved in RNAP I transcription. Distinct profiles for general transcription factors and TAF-containing complexes were uncovered for RNAP II promoters located in CpG and non-CpG islands suggesting distinct transcription initiation pathways. Our study broadens the spectrum of general transcription factor function and uncovers a plethora of novel, functional TBP-binding sites in the human genome.
Resumo:
BACKGROUND: The criteria for choosing relevant cell lines among a vast panel of available intestinal-derived lines exhibiting a wide range of functional properties are still ill-defined. The objective of this study was, therefore, to establish objective criteria for choosing relevant cell lines to assess their appropriateness as tumor models as well as for drug absorption studies. RESULTS: We made use of publicly available expression signatures and cell based functional assays to delineate differences between various intestinal colon carcinoma cell lines and normal intestinal epithelium. We have compared a panel of intestinal cell lines with patient-derived normal and tumor epithelium and classified them according to traits relating to oncogenic pathway activity, epithelial-mesenchymal transition (EMT) and stemness, migratory properties, proliferative activity, transporter expression profiles and chemosensitivity. For example, SW480 represent an EMT-high, migratory phenotype and scored highest in terms of signatures associated to worse overall survival and higher risk of recurrence based on patient derived databases. On the other hand, differentiated HT29 and T84 cells showed gene expression patterns closest to tumor bulk derived cells. Regarding drug absorption, we confirmed that differentiated Caco-2 cells are the model of choice for active uptake studies in the small intestine. Regarding chemosensitivity we were unable to confirm a recently proposed association of chemo-resistance with EMT traits. However, a novel signature was identified through mining of NCI60 GI50 values that allowed to rank the panel of intestinal cell lines according to their drug responsiveness to commonly used chemotherapeutics. CONCLUSIONS: This study presents a straightforward strategy to exploit publicly available gene expression data to guide the choice of cell-based models. While this approach does not overcome the major limitations of such models, introducing a rank order of selected features may allow selecting model cell lines that are more adapted and pertinent to the addressed biological question.
Resumo:
It is well-known that Amazon tropical forest soils contain high microbial biodiversity. However, anthropogenic actions of slash and burn, mainly for pasture establishment, induce profound changes in the well-balanced biogeochemical cycles. After a few years the grass yield usually declines, the pasture is abandoned and is transformed into a secondary vegetation called "capoeira" or fallow. The aim of this study was to examine how the clearing of Amazon rainforest for pasture affects: (1) the diversity of the Bacteria domain evaluated by Polymerase Chain Reaction and Denaturing Gradient Gel Electrophoresis (PCR-DGGE), (2) microbial biomass and some soil chemical properties (pH, moisture, P, K, Ca, Mg, Al, H + Al, and BS), and (3) the influence of environmental variables on the genetic structure of bacterial community. In the pasture soil, total carbon (C) was between 30 to 42 % higher than in the fallow, and almost 47 % higher than in the forest soil over a year. The same pattern was observed for N. Microbial biomass in the pasture was about 38 and 26 % higher than at fallow and forest sites, respectively, in the rainy season. DGGE profiling revealed a lower number of bands per area in the dry season, but differences in the structure of bacterial communities among sites were better defined than in the wet season. The bacterial DNA fingerprints in the forest were stronger related to Al content and the Cmic:Ctot and Nmic:Ntot ratios. For pasture and fallow sites, the structure of the Bacteria domain was more associated with pH, sum of bases, moisture, total C and N and the microbial biomass. In general microbial biomass in the soils was influenced by total C and N, which were associated with the Bacteria domain, since the bacterial community is a component and active fraction of the microbial biomass. Results show that the genetic composition of bacterial communities in Amazonian soils changed along the sequence forest-pasture-fallow.
Resumo:
The majority (60 %) of the soils in the Venezuelan Andes are Inceptisols, a large percentage of which are classified as Dystrustepts by the US Soil Taxonomy, Second Edition of 1999. Some of these soils were classified as Humitropepts (high organic - C-OC-soils) and Dystropepts by the Soil Taxonomy prior to 1999, but no equivalent large group was created for high-OC soils in the new Ustepts suborder. Dystrusepts developed on different materials, relief and vegetation. Their properties are closely related with the parent material. Soils developed on transported deposits or sediments have darker and thicker A horizons, a slightly acid reaction, greater CEC and OC contents than upland slope soils. Based on the previous classification into large groups (Humitropepts and Dystropepts) we found that: Humitropepts have a slightly less acid and higher values of CEC than Dystropepts. These properties or characteristics seem to be related to the fact that Humitropepts have a higher clay and OC content than the Dystropepts. Canonical discrimination analysis showed that the variables that discriminate the two great soil groups from each other are OC and silt. Data for Humitropepts are grouped around the OC vector (defining axis 3, principal component analysis), while Dystropepts are associated with the clay and sand vectors, with significant correlation. Given the importance of OC for soil properties, we propose the creation of a new large group named Humustepts for the order Inceptisol, suborder Ustepts.
Resumo:
Soil science has sought to develop better techniques for the classification of soils, one of which is the use of remote sensing applications. The use of ground sensors to obtain soil spectral data has enabled the characterization of these data and the advancement of techniques for the quantification of soil attributes. In order to do this, the creation of a soil spectral library is necessary. A spectral library should be representative of the variability of the soils in a region. The objective of this study was to create a spectral library of distinct soils from several agricultural regions of Brazil. Spectral data were collected (using a Fieldspec sensor, 350-2,500 nm) for the horizons of 223 soil profiles from the regions of Matão, Paraguaçu Paulista, Andradina, Ipaussu, Mirandópolis, Piracicaba, São Carlos, Araraquara, Guararapes, Valparaíso (SP); Naviraí, Maracajú, Rio Brilhante, Três Lagoas (MS); Goianésia (GO); and Uberaba and Lagoa da Prata (MG). A Principal Component Analysis (PCA) of the data was then performed and a graphic representation of the spectral curve was created for each profile. The reflectance intensity of the curves was principally influenced by the levels of Fe2O3, clay, organic matter and the presence of opaque minerals. There was no change in the spectral curves in the horizons of the Latossolos, Nitossolos, and Neossolos Quartzarênicos. Argissolos had superficial horizon curves with the greatest intensity of reflection above 2,200 nm. Cambissolos and Neossolos Litólicos had curves with greater reflectance intensity in poorly developed horizons. Gleisols showed a convex curve in the region of 350-400 nm. The PCA was able to separate different data collection areas according to the region of source material. Principal component one (PC1) was correlated with the intensity of reflectance samples and PC2 with the slope between the visible and infrared samples. The use of the Spectral Library as an indicator of possible soil classes proved to be an important tool in profile classification.
Resumo:
The structural stability and restructuring ability of a soil are related to the methods of crop management and soil preparation. A recommended strategy to reduce the effects of soil preparation is to use crop rotation and cover crops that help conserve and restore the soil structure. The aim of this study was to evaluate and quantify the homogeneous morphological units in soil under conventional mechanized tillage and animal traction, as well as to assess the effect on the soil structure of intercropping with jack bean (Canavalia ensiformis L.). Profiles were analyzed in April of 2006, in five counties in the Southern-Central region of Paraná State (Brazil), on family farms producing maize (Zea mays L.), sometimes intercropped with jack bean. The current structures in the crop profile were analyzed using Geographic Information Systems (GIS) and subsequently principal component analysis (PCA) to generate statistics. Morphostructural soil analysis showed a predominance of compact units in areas of high-intensity cultivation under mechanized traction. The cover crop did not improve the structure of the soil with low porosity and compact units that hamper the root system growth. In areas exposed to animal traction, a predominance of cracked units was observed, where roots grew around the clods and along the gaps between them.
Resumo:
Kinematic functional evaluation with body-worn sensors provides discriminative and responsive scores after shoulder surgery, but the optimal movements' combination has not yet been scientifically investigated. The aim of this study was the development of a simplified shoulder function kinematic score including only essential movements. The P Score, a seven-movement kinematic score developed on 31 healthy participants and 35 patients before surgery and at 3, 6 and 12 months after shoulder surgery, served as a reference.Principal component analysis and multiple regression were used to create simplified scoring models. The candidate models were compared to the reference score. ROC curve for shoulder pathology detection and correlations with clinical questionnaires were calculated.The B-B Score (hand to the Back and hand upwards as to change a Bulb) showed no difference to the P Score in time*score interaction (P > .05) and its relation with the reference score was highly linear (R(2) > .97). Absolute value of correlations with clinical questionnaires ranged from 0.51 to 0.77. Sensitivity was 97% and specificity 94%.The B-B and reference scores are equivalent for the measurement of group responses. The validated simplified scoring model presents practical advantages that facilitate the objective evaluation of shoulder function in clinical practice.
Resumo:
The agricultural potential is generally assessed and managed based on a one-dimensional vision of the soil profile, however, the increased appreciation of sustainable production has stimulated studies on faster and more accurate evaluation techniques and methods of the agricultural potential on detailed scales. The objective of this study was to investigate the possibility of using soil magnetic susceptibility for the identification of landscape segments on a detailed scale in the region of Jaboticabal, São Paulo State. The studied area has two slope curvatures: linear and concave, subdivided into three landscape segments: upper slope (US, concave), middle slope (MS, linear) and lower slope (LS, linear). In each of these segments, 20 points were randomly sampled from a database with 207 samples forming a regular grid installed in each landscape segment. The soil physical and chemical properties, CO2 emissions (FCO2) and magnetic susceptibility (MS) of the samples were evaluated represented by: magnetic susceptibility of air-dried fine earth (MS ADFE), magnetic susceptibility of the total sand fraction (MS TS) and magnetic susceptibility of the clay fraction (MS Cl) in the 0.00 - 0.15 m layer. The principal component analysis showed that MS is an important property that can be used to identify landscape segments, because the correlation of this property within the first principal component was high. The hierarchical cluster analysis method identified two groups based on the variables selected by principal component analysis; of the six selected variables, three were related to magnetic susceptibility. The landscape segments were differentiated similarly by the principal component analysis and by the cluster analysis using only the properties with higher discriminatory power. The cluster analysis of MS ADFE, MS TS and MS Cl allowed the formation of three groups that agree with the segment division established in the field. The grouping by cluster analysis indicated MS as a tool that could facilitate the identification of landscape segments and enable the mapping of more homogeneous areas at similar locations.
Resumo:
Since different pedologists will draw different soil maps of a same area, it is important to compare the differences between mapping by specialists and mapping techniques, as for example currently intensively discussed Digital Soil Mapping. Four detailed soil maps (scale 1:10.000) of a 182-ha sugarcane farm in the county of Rafard, São Paulo State, Brazil, were compared. The area has a large variation of soil formation factors. The maps were drawn independently by four soil scientists and compared with a fifth map obtained by a digital soil mapping technique. All pedologists were given the same set of information. As many field expeditions and soil pits as required by each surveyor were provided to define the mapping units (MUs). For the Digital Soil Map (DSM), spectral data were extracted from Landsat 5 Thematic Mapper (TM) imagery as well as six terrain attributes from the topographic map of the area. These data were summarized by principal component analysis to generate the map designs of groups through Fuzzy K-means clustering. Field observations were made to identify the soils in the MUs and classify them according to the Brazilian Soil Classification System (BSCS). To compare the conventional and digital (DSM) soil maps, they were crossed pairwise to generate confusion matrices that were mapped. The categorical analysis at each classification level of the BSCS showed that the agreement between the maps decreased towards the lower levels of classification and the great influence of the surveyor on both the mapping and definition of MUs in the soil map. The average correspondence between the conventional and DSM maps was similar. Therefore, the method used to obtain the DSM yielded similar results to those obtained by the conventional technique, while providing additional information about the landscape of each soil, useful for applications in future surveys of similar areas.
Resumo:
Considering that information from soil reflectance spectra is underutilized in soil classification, this paper aimed to evaluate the relationship of soil physical, chemical properties and their spectra, to identify spectral patterns for soil classes, evaluate the use of numerical classification of profiles combined with spectral data for soil classification. We studied 20 soil profiles from the municipality of Piracicaba, State of São Paulo, Brazil, which were morphologically described and classified up to the 3rd category level of the Brazilian Soil Classification System (SiBCS). Subsequently, soil samples were collected from pedogenetic horizons and subjected to soil particle size and chemical analyses. Their Vis-NIR spectra were measured, followed by principal component analysis. Pearson's linear correlation coefficients were determined among the four principal components and the following soil properties: pH, organic matter, P, K, Ca, Mg, Al, CEC, base saturation, and Al saturation. We also carried out interpretation of the first three principal components and their relationships with soil classes defined by SiBCS. In addition, numerical classification of the profiles based on the OSACA algorithm was performed using spectral data as a basis. We determined the Normalized Mutual Information (NMI) and Uncertainty Coefficient (U). These coefficients represent the similarity between the numerical classification and the soil classes from SiBCS. Pearson's correlation coefficients were significant for the principal components when compared to sand, clay, Al content and soil color. Visual analysis of the principal component scores showed differences in the spectral behavior of the soil classes, mainly among Argissolos and the others soils. The NMI and U similarity coefficients showed values of 0.74 and 0.64, respectively, suggesting good similarity between the numerical and SiBCS classes. For example, numerical classification correctly distinguished Argissolos from Latossolos and Nitossolos. However, this mathematical technique was not able to distinguish Latossolos from Nitossolos Vermelho férricos, but the Cambissolos were well differentiated from other soil classes. The numerical technique proved to be effective and applicable to the soil classification process.