159 resultados para Multivariate data
em Biblioteca Digital da Produção Intelectual da Universidade de São Paulo (BDPI/USP)
Resumo:
In this paper, we introduce a Bayesian analysis for survival multivariate data in the presence of a covariate vector and censored observations. Different ""frailties"" or latent variables are considered to capture the correlation among the survival times for the same individual. We assume Weibull or generalized Gamma distributions considering right censored lifetime data. We develop the Bayesian analysis using Markov Chain Monte Carlo (MCMC) methods.
Resumo:
Most multidimensional projection techniques rely on distance (dissimilarity) information between data instances to embed high-dimensional data into a visual space. When data are endowed with Cartesian coordinates, an extra computational effort is necessary to compute the needed distances, making multidimensional projection prohibitive in applications dealing with interactivity and massive data. The novel multidimensional projection technique proposed in this work, called Part-Linear Multidimensional Projection (PLMP), has been tailored to handle multivariate data represented in Cartesian high-dimensional spaces, requiring only distance information between pairs of representative samples. This characteristic renders PLMP faster than previous methods when processing large data sets while still being competitive in terms of precision. Moreover, knowing the range of variation for data instances in the high-dimensional space, we can make PLMP a truly streaming data projection technique, a trait absent in previous methods.
Resumo:
The efficacy of fluorescence spectroscopy to detect squamous cell carcinoma is evaluated in an animal model following laser excitation at 442 and 532 nm. Lesions are chemically induced with a topical DMBA application at the left lateral tongue of Golden Syrian hamsters. The animals are investigated every 2 weeks after the 4th week of induction until a total of 26 weeks. The right lateral tongue of each animal is considered as a control site (normal contralateral tissue) and the induced lesions are analyzed as a set of points covering the entire clinically detectable area. Based on fluorescence spectral differences, four indices are determined to discriminate normal and carcinoma tissues, based on intraspectral analysis. The spectral data are also analyzed using a multivariate data analysis and the results are compared with histology as the diagnostic gold standard. The best result achieved is for blue excitation using the KNN (K-nearest neighbor, a interspectral analysis) algorithm with a sensitivity of 95.7% and a specificity of 91.6%. These high indices indicate that fluorescence spectroscopy may constitute a fast noninvasive auxiliary tool for diagnostic of cancer within the oral cavity. (C) 2008 Society of Photo-Optical Instrumentation Engineers.
Resumo:
This paper aims to find relations between the socioeconomic characteristics, activity participation, land use patterns and travel behavior of the residents in the Sao Paulo Metropolitan Area (SPMA) by using Exploratory Multivariate Data Analysis (EMDA) techniques. The variables influencing travel pattern choices are investigated using: (a) Cluster Analysis (CA), grouping and characterizing the Traffic Zones (17), proposing the independent variable called Origin Cluster and, (b) Decision Tree (DT) to find a priori unknown relations among socioeconomic characteristics, land use attributes of the origin TZ and destination choices. The analysis was based on the origin-destination home-interview survey carried out in SPMA in 1997. The DT application revealed the variables of greatest influence on the travel pattern choice. The most important independent variable considered by DT is car ownership, followed by the Use of Transportation ""credits"" for Transit tariff, and, finally, activity participation variables and Origin Cluster. With these results, it was possible to analyze the influence of a family income, car ownership, position of the individual in the family, use of transportation ""credits"" for transit tariff (mainly for travel mode sequence choice), activities participation (activity sequence choice) and Origin Cluster (destination/travel distance choice). (c) 2010 Elsevier Ltd. All rights reserved.
Resumo:
There is a need of scientific evidence of claimed nutraceutical effects, but also there is a social movement towards the use of natural products and among them algae are seen as rich resources. Within this scenario, the development of methodology for rapid and reliable assessment of markers of efficiency and security of these extracts is necessary. The rat treated with streptozotocin has been proposed as the most appropriate model of systemic oxidative stress for studying antioxidant therapies. Cystoseira is a brown alga containing fucoxanthin and other carothenes whose pressure-assisted extracts were assayed to discover a possible beneficial effect on complications related to diabetes evolution in an acute but short-term model. Urine was selected as the sample and CE-TOF-MS as the analytical technique to obtain the fingerprints in a non-target metabolomic approach. Multivariate data analysis revealed a good clustering of the groups and permitted the putative assignment of compounds statistically significant in the classification. Interestingly a group of compounds associated to lysine glycation and cleavage from proteins was found to be increased in diabetic animals receiving vehicle as compared to control animals receiving vehicle (N6, N6, N6-trimethyl-L-lysine, N-methylnicotinamide, galactosylhydroxylysine, L-carnitine, N6-acetyl-N6-hydroxylysine, fructose-lysine, pipecolic acid, urocanic acid, amino-isobutanoate, formylisoglutamine. Fructoselysine significantly decreased after the treatment changing from a 24% increase to a 19% decrease. CE-MS fingerprinting of urine has provided a group of compounds different to those detected with other techniques and therefore proves the necessity of a cross-platform analysis to obtain a broad view of biological samples.
Resumo:
The aim of the present study was to evaluate the effect of soil characteristics (pH, macro- and micro-nutrients), environmental factors (temperature, humidity, period of the year and time of day of collection) and meteorological conditions (rain, sun, cloud and cloud/rain) on the flavonoid content of leaves of Passiflora incarnata L., Passifloraceae. The total flavonoid contents of leaf samples harvested from plants cultivated or collected under different conditions were quantified by high-performance liquid chromatography with ultraviolet detection (HPLC-UV/PAD). Chemometric treatment of the data by principal component (PCA) and hierarchic cluster analyses (HCA) showed that the samples did not present a specific classification in relation to the environmental and soil variables studied, and that the environmental variables were not significant in describing the data set. However, the levels of the elements Fe, B and Cu present in the soil showed an inverse correlation with the total flavonoid contents of the leaves of P. incarnata.
Resumo:
The application of laser induced breakdown spectrometry (LIBS) aiming the direct analysis of plant materials is a great challenge that still needs efforts for its development and validation. In this way, a series of experimental approaches has been carried out in order to show that LIBS can be used as an alternative method to wet acid digestions based methods for analysis of agricultural and environmental samples. The large amount of information provided by LIBS spectra for these complex samples increases the difficulties for selecting the most appropriated wavelengths for each analyte. Some applications have suggested that improvements in both accuracy and precision can be achieved by the application of multivariate calibration in LIBS data when compared to the univariate regression developed with line emission intensities. In the present work, the performance of univariate and multivariate calibration, based on partial least squares regression (PLSR), was compared for analysis of pellets of plant materials made from an appropriate mixture of cryogenically ground samples with cellulose as the binding agent. The development of a specific PLSR model for each analyte and the selection of spectral regions containing only lines of the analyte of interest were the best conditions for the analysis. In this particular application, these models showed a similar performance. but PLSR seemed to be more robust due to a lower occurrence of outliers in comparison to the univariate method. Data suggests that efforts dealing with sample presentation and fitness of standards for LIBS analysis must be done in order to fulfill the boundary conditions for matrix independent development and validation. (C) 2009 Elsevier B.V. All rights reserved.
Resumo:
Fourier transform near infrared (FT-NIR) spectroscopy was evaluated as an analytical too[ for monitoring residual Lignin, kappa number and hexenuronic acids (HexA) content in kraft pulps of Eucalyptus globulus. Sets of pulp samples were prepared under different cooking conditions to obtain a wide range of compound concentrations that were characterised by conventional wet chemistry analytical methods. The sample group was also analysed using FT-NIR spectroscopy in order to establish prediction models for the pulp characteristics. Several models were applied to correlate chemical composition in samples with the NIR spectral data by means of PCR or PLS algorithms. Calibration curves were built by using all the spectral data or selected regions. Best calibration models for the quantification of lignin, kappa and HexA were proposed presenting R-2 values of 0.99. Calibration models were used to predict pulp titers of 20 external samples in a validation set. The lignin concentration and kappa number in the range of 1.4-18% and 8-62, respectively, were predicted fairly accurately (standard error of prediction, SEP 1.1% for lignin and 2.9 for kappa). The HexA concentration (range of 5-71 mmol kg(-1) pulp) was more difficult to predict and the SEP was 7.0 mmol kg(-1) pulp in a model of HexA quantified by an ultraviolet (UV) technique and 6.1 mmol kg(-1) pulp in a model of HexA quantified by anion-exchange chromatography (AEC). Even in wet chemical procedures used for HexA determination, there is no good agreement between methods as demonstrated by the UV and AEC methods described in the present work. NIR spectroscopy did provide a rapid estimate of HexA content in kraft pulps prepared in routine cooking experiments.
Resumo:
For the first time, we introduce and study some mathematical properties of the Kumaraswamy Weibull distribution that is a quite flexible model in analyzing positive data. It contains as special sub-models the exponentiated Weibull, exponentiated Rayleigh, exponentiated exponential, Weibull and also the new Kumaraswamy exponential distribution. We provide explicit expressions for the moments and moment generating function. We examine the asymptotic distributions of the extreme values. Explicit expressions are derived for the mean deviations, Bonferroni and Lorenz curves, reliability and Renyi entropy. The moments of the order statistics are calculated. We also discuss the estimation of the parameters by maximum likelihood. We obtain the expected information matrix. We provide applications involving two real data sets on failure times. Finally, some multivariate generalizations of the Kumaraswamy Weibull distribution are discussed. (C) 2010 The Franklin Institute. Published by Elsevier Ltd. All rights reserved.
Resumo:
Proteinuria was associated with cardiovascular events and mortality in community-based cohorts. The association of proteinuria with mortality and cardiovascular events in patients undergoing percutaneous coronary intervention (PCI) was unknown. The association of urinary dipstick proteinuria with mortality and cardiovascular events (composite of death, myocardial infarction, or nonhemorrhagic stroke) in 5,835 subjects of the EXCITE trial was evaluated. Dipstick urinalysis was performed before PCI, and proteinuria was defined as trace or greater. Subjects were followed up for 210 days/7 months after enrollment for the occurrence of events. Multivariate Cox regression analysis evaluated the independent association of proteinuria with each outcome. Mean age was 59 years, 21% were women, 18% had diabetes mellitus, and mean estimated glomerular filtration rate was 90 ml/min/1.73 m(2). Proteinuria was present in 750 patients (13%). During follow-up, 22 subjects (2.9%) with proteinuria and 54 subjects (1.1%) without proteinuria died (adjusted hazard ratio 2.83, 95% confidence interval [CI] 1.65 to 4.84, p <0.001). The severity of proteinuria attenuated the strength of the association with mortality after PCI (low-grade proteinuria, hazard ratio 2.67, 95% CI 1.50 to 4.75; high-grade proteinuria, hazard ratio 3.76, 95% CI 1.24 to 11.37). No significant association was present for cardiovascular events during the relatively short follow-up, but high-grade proteinuria tended toward increased risk of cardiovascular events (hazard ratio 1.45, 95% CI 0.81 to 2.61). In conclusion, proteinuria was strongly and independently associated with mortality in patients undergoing PCI. These data suggest that such a relatively simple and clinically easy to use tool as urinary dipstick may be useful to identify and treat patients at high risk of mortality at the time of PCI. (C) 2008 Elsevier Inc. All rights reserved. (Am J Cardiol 2008;102:1151-1155)
Resumo:
The identification, modeling, and analysis of interactions between nodes of neural systems in the human brain have become the aim of interest of many studies in neuroscience. The complex neural network structure and its correlations with brain functions have played a role in all areas of neuroscience, including the comprehension of cognitive and emotional processing. Indeed, understanding how information is stored, retrieved, processed, and transmitted is one of the ultimate challenges in brain research. In this context, in functional neuroimaging, connectivity analysis is a major tool for the exploration and characterization of the information flow between specialized brain regions. In most functional magnetic resonance imaging (fMRI) studies, connectivity analysis is carried out by first selecting regions of interest (ROI) and then calculating an average BOLD time series (across the voxels in each cluster). Some studies have shown that the average may not be a good choice and have suggested, as an alternative, the use of principal component analysis (PCA) to extract the principal eigen-time series from the ROI(s). In this paper, we introduce a novel approach called cluster Granger analysis (CGA) to study connectivity between ROIs. The main aim of this method was to employ multiple eigen-time series in each ROI to avoid temporal information loss during identification of Granger causality. Such information loss is inherent in averaging (e.g., to yield a single ""representative"" time series per ROI). This, in turn, may lead to a lack of power in detecting connections. The proposed approach is based on multivariate statistical analysis and integrates PCA and partial canonical correlation in a framework of Granger causality for clusters (sets) of time series. We also describe an algorithm for statistical significance testing based on bootstrapping. By using Monte Carlo simulations, we show that the proposed approach outperforms conventional Granger causality analysis (i.e., using representative time series extracted by signal averaging or first principal components estimation from ROIs). The usefulness of the CGA approach in real fMRI data is illustrated in an experiment using human faces expressing emotions. With this data set, the proposed approach suggested the presence of significantly more connections between the ROIs than were detected using a single representative time series in each ROI. (c) 2010 Elsevier Inc. All rights reserved.
Resumo:
Functional magnetic resonance imaging (fMRI) is currently one of the most widely used methods for studying human brain function in vivo. Although many different approaches to fMRI analysis are available, the most widely used methods employ so called ""mass-univariate"" modeling of responses in a voxel-by-voxel fashion to construct activation maps. However, it is well known that many brain processes involve networks of interacting regions and for this reason multivariate analyses might seem to be attractive alternatives to univariate approaches. The current paper focuses on one multivariate application of statistical learning theory: the statistical discrimination maps (SDM) based on support vector machine, and seeks to establish some possible interpretations when the results differ from univariate `approaches. In fact, when there are changes not only on the activation level of two conditions but also on functional connectivity, SDM seems more informative. We addressed this question using both simulations and applications to real data. We have shown that the combined use of univariate approaches and SDM yields significant new insights into brain activations not available using univariate methods alone. In the application to a visual working memory fMRI data, we demonstrated that the interaction among brain regions play a role in SDM`s power to detect discriminative voxels. (C) 2008 Elsevier B.V. All rights reserved.
Resumo:
This paper presents a GIS-based multicriteria flood risk assessment and mapping approach applied to coastal drainage basins where hydrological data are not available. It involves risk to different types of possible processes: coastal inundation (storm surge), river, estuarine and flash flood, either at urban or natural areas, and fords. Based on the causes of these processes, several environmental indicators were taken to build-up the risk assessment. Geoindicators include geological-geomorphologic proprieties of Quaternary sedimentary units, water table, drainage basin morphometry, coastal dynamics, beach morphodynamics and microclimatic characteristics. Bioindicators involve coastal plain and low slope native vegetation categories and two alteration states. Anthropogenic indicators encompass land use categories properties such as: type, occupation density, urban structure type and occupation consolidation degree. The selected indicators were stored within an expert Geoenvironmental Information System developed for the State of Sao Paulo Coastal Zone (SIIGAL), which attributes were mathematically classified through deterministic approaches, in order to estimate natural susceptibilities (Sn), human-induced susceptibilities (Sa), return period of rain events (Ri), potential damages (Dp) and the risk classification (R), according to the equation R=(Sn.Sa.Ri).Dp. Thematic maps were automatically processed within the SIIGAL, in which automata cells (""geoenvironmental management units"") aggregating geological-geomorphologic and land use/native vegetation categories were the units of classification. The method has been applied to the Northern Littoral of the State of Sao Paulo (Brazil) in 32 small drainage basins, demonstrating to be very useful for coastal zone public politics, civil defense programs and flood management.
Resumo:
A joint transcriptomic and proteomic approach employing two-dimensional electrophoresis, liquid chromatography and mass spectrometry was carried out to identify peptides and proteins expressed by the venom gland of the snake Bothrops insularis, an endemic species of Queimada Grande Island, Brazil. Four protein families were mainly represented in processed spots, namely metalloproteinase, serine proteinase, phospholipase A(2) and lectin. Other represented families were growth factors, the developmental protein G10, a disintegrin and putative novel bradykinin-potentiating peptides. The enzymes were present in several isoforms. Most of the experimental data agreed with predicted values for isoelectric point and M(r) of proteins found in the transcriptome of the venom gland. The results also support the existence of posttranslational modifications and of proteolytic processing of precursor molecules which could lead to diverse multifunctional proteins. This study provides a preliminary reference map for proteins and peptides present in Bothrops insularis whole venom establishing the basis for comparative studies of other venom proteomes which could help the search for new drugs and the improvement of venom therapeutics. Altogether, our data point to the influence of transcriptional and post-translational events on the final venom composition and stress the need for a multivariate approach to snake venomics studies. (c) 2009 Elsevier B.V. All rights reserved.
Resumo:
In this paper, we introduce a Bayesian analysis for bioequivalence data assuming multivariate pharmacokinetic measures. With the introduction of correlation parameters between the pharmacokinetic measures or between the random effects in the bioequivalence models, we observe a good improvement in the bioequivalence results. These results are of great practical interest since they can yield higher accuracy and reliability for the bioequivalence tests, usually assumed by regulatory offices. An example is introduced to illustrate the proposed methodology by comparing the usual univariate bioequivalence methods with multivariate bioequivalence. We also consider some usual existing discrimination Bayesian methods to choose the best model to be used in bioequivalence studies.