994 resultados para Transformed data
Resumo:
O aumento de tecnologias disponíveis na Web favoreceu o aparecimento de diversas formas de informação, recursos e serviços. Este aumento aliado à constante necessidade de formação e evolução das pessoas, quer a nível pessoal como profissional, incentivou o desenvolvimento área de sistemas de hipermédia adaptativa educacional - SHAE. Estes sistemas têm a capacidade de adaptar o ensino consoante o modelo do aluno, características pessoais, necessidades, entre outros aspetos. Os SHAE permitiram introduzir mudanças relativamente à forma de ensino, passando do ensino tradicional que se restringia apenas ao uso de livros escolares até à utilização de ferramentas informáticas que através do acesso à internet disponibilizam material didático, privilegiando o ensino individualizado. Os SHAE geram grande volume de dados, informação contida no modelo do aluno e todos os dados relativos ao processo de aprendizagem de cada aluno. Facilmente estes dados são ignorados e não se procede a uma análise cuidada que permita melhorar o conhecimento do comportamento dos alunos durante o processo de ensino, alterando a forma de aprendizagem de acordo com o aluno e favorecendo a melhoria dos resultados obtidos. O objetivo deste trabalho foi selecionar e aplicar algumas técnicas de Data Mining a um SHAE, PCMAT - Mathematics Collaborative Educational System. A aplicação destas técnicas deram origem a modelos de dados que transformaram os dados em informações úteis e compreensíveis, essenciais para a geração de novos perfis de alunos, padrões de comportamento de alunos, regras de adaptação e pedagógicas. Neste trabalho foram criados alguns modelos de dados recorrendo à técnica de Data Mining de classificação, abordando diferentes algoritmos. Os resultados obtidos permitirão definir novas regras de adaptação e padrões de comportamento dos alunos, poderá melhorar o processo de aprendizagem disponível num SHAE.
Resumo:
Factor analysis as frequent technique for multivariate data inspection is widely used also for compositional data analysis. The usual way is to use a centered logratio (clr)transformation to obtain the random vector y of dimension D. The factor model istheny = Λf + e (1)with the factors f of dimension k & D, the error term e, and the loadings matrix Λ.Using the usual model assumptions (see, e.g., Basilevsky, 1994), the factor analysismodel (1) can be written asCov(y) = ΛΛT + ψ (2)where ψ = Cov(e) has a diagonal form. The diagonal elements of ψ as well as theloadings matrix Λ are estimated from an estimation of Cov(y).Given observed clr transformed data Y as realizations of the random vectory. Outliers or deviations from the idealized model assumptions of factor analysiscan severely effect the parameter estimation. As a way out, robust estimation ofthe covariance matrix of Y will lead to robust estimates of Λ and ψ in (2), seePison et al. (2003). Well known robust covariance estimators with good statisticalproperties, like the MCD or the S-estimators (see, e.g. Maronna et al., 2006), relyon a full-rank data matrix Y which is not the case for clr transformed data (see,e.g., Aitchison, 1986).The isometric logratio (ilr) transformation (Egozcue et al., 2003) solves thissingularity problem. The data matrix Y is transformed to a matrix Z by usingan orthonormal basis of lower dimension. Using the ilr transformed data, a robustcovariance matrix C(Z) can be estimated. The result can be back-transformed tothe clr space byC(Y ) = V C(Z)V Twhere the matrix V with orthonormal columns comes from the relation betweenthe clr and the ilr transformation. Now the parameters in the model (2) can beestimated (Basilevsky, 1994) and the results have a direct interpretation since thelinks to the original variables are still preserved.The above procedure will be applied to data from geochemistry. Our specialinterest is on comparing the results with those of Reimann et al. (2002) for the Kolaproject data
Resumo:
The most suitable method for estimation of size diversity is investigated. Size diversity is computed on the basis of the Shannon diversity expression adapted for continuous variables, such as size. It takes the form of an integral involving the probability density function (pdf) of the size of the individuals. Different approaches for the estimation of pdf are compared: parametric methods, assuming that data come from a determinate family of pdfs, and nonparametric methods, where pdf is estimated using some kind of local evaluation. Exponential, generalized Pareto, normal, and log-normal distributions have been used to generate simulated samples using estimated parameters from real samples. Nonparametric methods include discrete computation of data histograms based on size intervals and continuous kernel estimation of pdf. Kernel approach gives accurate estimation of size diversity, whilst parametric methods are only useful when the reference distribution have similar shape to the real one. Special attention is given for data standardization. The division of data by the sample geometric mean is proposedas the most suitable standardization method, which shows additional advantages: the same size diversity value is obtained when using original size or log-transformed data, and size measurements with different dimensionality (longitudes, areas, volumes or biomasses) may be immediately compared with the simple addition of ln k where kis the dimensionality (1, 2, or 3, respectively). Thus, the kernel estimation, after data standardization by division of sample geometric mean, arises as the most reliable and generalizable method of size diversity evaluation
Resumo:
Factor analysis as frequent technique for multivariate data inspection is widely used also for compositional data analysis. The usual way is to use a centered logratio (clr) transformation to obtain the random vector y of dimension D. The factor model is then y = Λf + e (1) with the factors f of dimension k < D, the error term e, and the loadings matrix Λ. Using the usual model assumptions (see, e.g., Basilevsky, 1994), the factor analysis model (1) can be written as Cov(y) = ΛΛT + ψ (2) where ψ = Cov(e) has a diagonal form. The diagonal elements of ψ as well as the loadings matrix Λ are estimated from an estimation of Cov(y). Given observed clr transformed data Y as realizations of the random vector y. Outliers or deviations from the idealized model assumptions of factor analysis can severely effect the parameter estimation. As a way out, robust estimation of the covariance matrix of Y will lead to robust estimates of Λ and ψ in (2), see Pison et al. (2003). Well known robust covariance estimators with good statistical properties, like the MCD or the S-estimators (see, e.g. Maronna et al., 2006), rely on a full-rank data matrix Y which is not the case for clr transformed data (see, e.g., Aitchison, 1986). The isometric logratio (ilr) transformation (Egozcue et al., 2003) solves this singularity problem. The data matrix Y is transformed to a matrix Z by using an orthonormal basis of lower dimension. Using the ilr transformed data, a robust covariance matrix C(Z) can be estimated. The result can be back-transformed to the clr space by C(Y ) = V C(Z)V T where the matrix V with orthonormal columns comes from the relation between the clr and the ilr transformation. Now the parameters in the model (2) can be estimated (Basilevsky, 1994) and the results have a direct interpretation since the links to the original variables are still preserved. The above procedure will be applied to data from geochemistry. Our special interest is on comparing the results with those of Reimann et al. (2002) for the Kola project data
Resumo:
Matheron's usual variogram estimator can result in unreliable variograms when data are strongly asymmetric or skewed. Asymmetry in a distribution can arise from a long tail of values in the underlying process or from outliers that belong to another population that contaminate the primary process. This paper examines the effects of underlying asymmetry on the variogram and on the accuracy of prediction, and the second one examines the effects arising from outliers. Standard geostatistical texts suggest ways of dealing with underlying asymmetry; however, this is based on informed intuition rather than detailed investigation. To determine whether the methods generally used to deal with underlying asymmetry are appropriate, the effects of different coefficients of skewness on the shape of the experimental variogram and on the model parameters were investigated. Simulated annealing was used to create normally distributed random fields of different size from variograms with different nugget:sill ratios. These data were then modified to give different degrees of asymmetry and the experimental variogram was computed in each case. The effects of standard data transformations on the form of the variogram were also investigated. Cross-validation was used to assess quantitatively the performance of the different variogram models for kriging. The results showed that the shape of the variogram was affected by the degree of asymmetry, and that the effect increased as the size of data set decreased. Transformations of the data were more effective in reducing the skewness coefficient in the larger sets of data. Cross-validation confirmed that variogram models from transformed data were more suitable for kriging than were those from the raw asymmetric data. The results of this study have implications for the 'standard best practice' in dealing with asymmetry in data for geostatistical analyses. (C) 2007 Elsevier Ltd. All rights reserved.
New methods for quantification and analysis of quantitative real-time polymerase chain reaction data
Resumo:
Quantitative real-time polymerase chain reaction (qPCR) is a sensitive gene quantitation method that has been widely used in the biological and biomedical fields. The currently used methods for PCR data analysis, including the threshold cycle (CT) method, linear and non-linear model fitting methods, all require subtracting background fluorescence. However, the removal of background fluorescence is usually inaccurate, and therefore can distort results. Here, we propose a new method, the taking-difference linear regression method, to overcome this limitation. Briefly, for each two consecutive PCR cycles, we subtracted the fluorescence in the former cycle from that in the later cycle, transforming the n cycle raw data into n-1 cycle data. Then linear regression was applied to the natural logarithm of the transformed data. Finally, amplification efficiencies and the initial DNA molecular numbers were calculated for each PCR run. To evaluate this new method, we compared it in terms of accuracy and precision with the original linear regression method with three background corrections, being the mean of cycles 1-3, the mean of cycles 3-7, and the minimum. Three criteria, including threshold identification, max R2, and max slope, were employed to search for target data points. Considering that PCR data are time series data, we also applied linear mixed models. Collectively, when the threshold identification criterion was applied and when the linear mixed model was adopted, the taking-difference linear regression method was superior as it gave an accurate estimation of initial DNA amount and a reasonable estimation of PCR amplification efficiencies. When the criteria of max R2 and max slope were used, the original linear regression method gave an accurate estimation of initial DNA amount. Overall, the taking-difference linear regression method avoids the error in subtracting an unknown background and thus it is theoretically more accurate and reliable. This method is easy to perform and the taking-difference strategy can be extended to all current methods for qPCR data analysis.^
Resumo:
I developed a new model for estimating annual production-to-biomass ratio P/B and production P of macrobenthic populations in marine and freshwater habitats. Self-learning artificial neural networks (ANN) were used to model the relationships between P/B and twenty easy-to-measure abiotic and biotic parameters in 1252 data sets of population production. Based on log-transformed data, the final predictive model estimates log(P/B) with reasonable accuracy and precision (r2 = 0.801; residual mean square RMS = 0.083). Body mass and water temperature contributed most to the explanatory power of the model. However, as with all least squares models using nonlinearly transformed data, back-transformation to natural scale introduces a bias in the model predictions, i.e., an underestimation of P/B (and P). When estimating production of assemblages of populations by adding up population estimates, accuracy decreases but precision increases with the number of populations in the assemblage.
Resumo:
This dissertation develops a new mathematical approach that overcomes the effect of a data processing phenomenon known as “histogram binning” inherent to flow cytometry data. A real-time procedure is introduced to prove the effectiveness and fast implementation of such an approach on real-world data. The histogram binning effect is a dilemma posed by two seemingly antagonistic developments: (1) flow cytometry data in its histogram form is extended in its dynamic range to improve its analysis and interpretation, and (2) the inevitable dynamic range extension introduces an unwelcome side effect, the binning effect, which skews the statistics of the data, undermining as a consequence the accuracy of the analysis and the eventual interpretation of the data. ^ Researchers in the field contended with such a dilemma for many years, resorting either to hardware approaches that are rather costly with inherent calibration and noise effects; or have developed software techniques based on filtering the binning effect but without successfully preserving the statistical content of the original data. ^ The mathematical approach introduced in this dissertation is so appealing that a patent application has been filed. The contribution of this dissertation is an incremental scientific innovation based on a mathematical framework that will allow researchers in the field of flow cytometry to improve the interpretation of data knowing that its statistical meaning has been faithfully preserved for its optimized analysis. Furthermore, with the same mathematical foundation, proof of the origin of such an inherent artifact is provided. ^ These results are unique in that new mathematical derivations are established to define and solve the critical problem of the binning effect faced at the experimental assessment level, providing a data platform that preserves its statistical content. ^ In addition, a novel method for accumulating the log-transformed data was developed. This new method uses the properties of the transformation of statistical distributions to accumulate the output histogram in a non-integer and multi-channel fashion. Although the mathematics of this new mapping technique seem intricate, the concise nature of the derivations allow for an implementation procedure that lends itself to a real-time implementation using lookup tables, a task that is also introduced in this dissertation. ^
Resumo:
This dissertation develops a new mathematical approach that overcomes the effect of a data processing phenomenon known as "histogram binning" inherent to flow cytometry data. A real-time procedure is introduced to prove the effectiveness and fast implementation of such an approach on real-world data. The histogram binning effect is a dilemma posed by two seemingly antagonistic developments: (1) flow cytometry data in its histogram form is extended in its dynamic range to improve its analysis and interpretation, and (2) the inevitable dynamic range extension introduces an unwelcome side effect, the binning effect, which skews the statistics of the data, undermining as a consequence the accuracy of the analysis and the eventual interpretation of the data. Researchers in the field contended with such a dilemma for many years, resorting either to hardware approaches that are rather costly with inherent calibration and noise effects; or have developed software techniques based on filtering the binning effect but without successfully preserving the statistical content of the original data. The mathematical approach introduced in this dissertation is so appealing that a patent application has been filed. The contribution of this dissertation is an incremental scientific innovation based on a mathematical framework that will allow researchers in the field of flow cytometry to improve the interpretation of data knowing that its statistical meaning has been faithfully preserved for its optimized analysis. Furthermore, with the same mathematical foundation, proof of the origin of such an inherent artifact is provided. These results are unique in that new mathematical derivations are established to define and solve the critical problem of the binning effect faced at the experimental assessment level, providing a data platform that preserves its statistical content. In addition, a novel method for accumulating the log-transformed data was developed. This new method uses the properties of the transformation of statistical distributions to accumulate the output histogram in a non-integer and multi-channel fashion. Although the mathematics of this new mapping technique seem intricate, the concise nature of the derivations allow for an implementation procedure that lends itself to a real-time implementation using lookup tables, a task that is also introduced in this dissertation.
Resumo:
Background: Tramadol is a well tolerated and effective analgesic used to treat moderate to severe pain. Several generic formulations of tramadol are available in Brazil; however, published information regarding their bioequivalence in the Brazilian population is not available. A study was designed for Brazilian regulatory authorities to allow marketing of a generic formulation. Objective: The purpose of this study was to compare the bioequivalence of 2 commercial tablet preparations containing tramadol 100 mg marketed for use in Brazil. Methods: A randomized, open-label, 2 x 2 crossover study was performed in healthy Brazilian volunteers under fasting conditions with a washout period of 12 days. Two tablet formulations of tramadol 100 mg (test and reference formulations) were administered as a single oral dose, and blood samples were collected over 24 hours. Tramadol plasma concentrations were quantified using a validated HPLC method. A plasma concentration time profile was generated for each volunteer and then mean values were determined, from which C(max), T(max), AUC(0-t), AUC(0-infinity), k(e), and t(1/2) were calculated using a noncompartmental model. Bioequivalence between the products was determined by calculating 90% CIs for the ratios of C(max), AUC(0-t), and AUC(0-infinity) values for the test and reference products using log-transformed data. Tolerability was assessed by monitoring vital signs (temperature, blood pressure, heart rate), laboratory tests (hematology, blood biochemistry, hepatic function, urinalysis), and interviews with the volunteers before medication administration and every 2 hours during the study. Results: Twenty-six healthy volunteers (13 men, 13 women) were enrolled in and completed the study. Mean (SD) age was 30 (6.8) years (range, 21-44 years), mean weight was 64 (8.3) kg (range, 53-79 kg), and mean height was 166 (6.4) cm (range, 155-178 cm). The 90% CIs for the ratios of C(max) (1.01-1.17), AUC(0-t) (1.00-1.13), and AUC(0-infinity) (1.00-1.14) values for the test and reference products fell within the interval of 0.80 to 1.25 proposed by most regulatory agencies, including the Brazilian regulatory body. No clinically important adverse effects were reported; only mild somnolence was reported by 4 volunteers and mild headaches by 5 volunteers, and there was no need to use medication to treat these symptoms. Conclusion: Pharmacokinetic analysis in these healthy Brazilian volunteers suggested that the test and reference formulations of tramadol 100-mg tablets met the regulatory requirements to assume bio-equivalence based on the Brazilian regulatory definition. (Clin Ther 2010;32:758-765) (C) 2010 Excerpta Medica Inc.
Resumo:
The purpose of this study was to evaluate bioequivalence of two commercial 8 mg tablet formulations of ondansetrona available ill the Brazilian market. In this study, a simple, rapid, sensitive and selective liquid chromarography-tandem mass spectrometry method is described for the determination of ondansetron in human plasma samples. The method was validated over a concentration range of 2.5-60 ng/ml and used in a bioequivalence trial between orally disintegrating and conventional tablet ondansetron formulations, to assess its usefulness in this kind of Study. Vonau flash (R) (Biolab Sanus Farmaceutica, Brazil, as test formulations) and Zofran (R) (GlaxoSmithKline, Brazil, as reference formulation) were evaluated following a single 8 mg close to 23 healthy volunteers of both genders. The dose was administered after an overnight fast according to a two-way crossover design. Bioequivalence between the products was determinated by Calculating 90% confidence interval (90% CI) for the ratio of C(max), AUC(0-t) and AUC(0-(sic)) values for the test and reference products, using logarithmically transformed data. The 90% confidence interval for the ratio of C(max) (87.5-103.8%), AUC(0-t) (89.3-107.2%) and AUC(0--(sic)) (89.7-106.0%) values for the test and reference products is Within the 80-125% interval, proposed by FDA, EMEA and ANVISA. It was concluded that two ondansetron formulations are bioequivalent ill their rate and extent of absorption. (C) 2008 Elsevier B.V. All rights reserved.
Resumo:
Background: Zidovudine is a thymidine nucleoside reverse transcriptase inhibitor with activity against HIV type 1. Some (similar to 8) generic formulations of zidovudine are available in Brazil; however, based on a literature search, information concerning their bioavailability and pharmacokinetic properties in the Brazilian population has not been reported. Objective: The aim of this study was to compare the bioavailability and pharmacokinetic properties of 2 capsule formulations of zidovudine 100 mg in healthy Brazilian volunteers. Methods: This open-label, randomized, 2-way crossover study utilized a 1-week washout period between doses. Blood samples were collected for 8 hours after a single dose of zidovudine 100-mg test (Zidovudina, Fundaqdo para o Remedio Popular, Sao Paulo, Brazil) or reference formulation (Retrovir (R), GlaxoSmithKline, Philadelphia, Pennsylvania). Plasma zidovudine concentrations were determined using a validated high-performance liquid chromatography method with ultraviolet detection at 265 nm. C-max, T-max, AUC(0-t), AUC(0-infinity), t(1/2), and the elimination constant (k(e)) were determined using noncompartmental analysis. The formulations were considered bioequivalent if the 90% CIS for C-max, AUC(0-t), and AUC(0-infinity) fell within the interval of 80 % to 125 %, the regulatory definition set by the US Food and Drug Administration (FDA). Results: Twenty-four healthy volunteers (12 males, 12 females; mean age, 27 years; weight, 60 kg; height, 167 cm) were enrolled and completed the study. The 90% CIs of the treatment ratios for the logarithmic-transformed values of C-max, AUC(0-t), and AUC(0-infinity) were 80.0% to 113.6%, 93.9% to 109.7%, and 93.6% to 110.1 %, respectively. The values for the test and reference formulations were within the FDA bioequivalence definition intervals of 80% to 125%. Conclusions: In this small study in healthy subjects, no statistically significant differences in C-max, AUC(0-t), and AUC(0-)infinity were found between the test and reference formulations of zidovudine 100-mg capsules. The 90% CIs for the mean ratio values for the test and reference formulations of AUC(0-t), AUC(0-infinity), and C-max indicated that the reported data were entirely within the bioequivalence acceptance range proposed by the FDA of 80% to 125% (using log-transformed data).
Resumo:
The superior cervical ganglion (SCG) in mammals varies in structure according to developmental age, body size, gender, lateral asymmetry, the size and nuclear content of neurons and the complexity and synaptic coverage of their dendritic trees. In small and medium-sized mammals, neuron number and size increase from birth to adulthood and, in phylogenetic studies, vary with body size. However, recent studies on larger animals suggest that body weight does not, in general, accurately predict neuron number. We have applied design-based stereological tools at the light-microscopic level to assess the volumetric composition of ganglia and to estimate the numbers and sizes of neurons in SCGs from rats, capybaras and horses. Using transmission electron microscopy, we have obtained design-based estimates of the surface coverage of dendrites by postsynaptic apposition zones and model-based estimates of the numbers and sizes of synaptophysin-labelled axo-dendritic synaptic disks. Linear regression analysis of log-transformed data has been undertaken in order to establish the nature of the relationships between numbers and SCG volume (V(scg)). For SCGs (five per species), the allometric relationship for neuron number (N) is N=35,067xV (scg) (0.781) and that for synapses is N=20,095,000xV (scg) (1.328) , the former being a good predictor and the latter a poor predictor of synapse number. Our findings thus reveal the nature of SCG growth in terms of its main ingredients (neurons, neuropil, blood vessels) and show that larger mammals have SCG neurons exhibiting more complex arborizations and greater numbers of axo-dendritic synapses.
Resumo:
This paper aims to study the best way to express the parasitemia of Trypanosoma cruzi's experimentally infected animals. Individual scores may have a great variability, not emphasized by the majority of the authors. A group of 50 rats infected with 1x10(6) trypomastigotes of T. cruzi Y strain was used and the parasitemia was estimated by BRENER' s method. The results showed that the median can avoid false results due to very high or low parasitemias but it does not have the mathematic properties necessary for analysis of variance. The comparison of the means of the original and transformed data, with their respective coefficients of variability (CV), showed that the logarithmic mean (Mlog) have the minor value of CV. Therefore, the Mlog is the best way to express the parasitemia when the data show great variability. The number of the animal for group did not affect the variability of data when the Mlog and CV were used.