852 resultados para Initial data problem
Resumo:
Despite the central role of quantitative PCR (qPCR) in the quantification of mRNA transcripts, most analyses of qPCR data are still delegated to the software that comes with the qPCR apparatus. This is especially true for the handling of the fluorescence baseline. This article shows that baseline estimation errors are directly reflected in the observed PCR efficiency values and are thus propagated exponentially in the estimated starting concentrations as well as 'fold-difference' results. Because of the unknown origin and kinetics of the baseline fluorescence, the fluorescence values monitored in the initial cycles of the PCR reaction cannot be used to estimate a useful baseline value. An algorithm that estimates the baseline by reconstructing the log-linear phase downward from the early plateau phase of the PCR reaction was developed and shown to lead to very reproducible PCR efficiency values. PCR efficiency values were determined per sample by fitting a regression line to a subset of data points in the log-linear phase. The variability, as well as the bias, in qPCR results was significantly reduced when the mean of these PCR efficiencies per amplicon was used in the calculation of an estimate of the starting concentration per sample.
Resumo:
The DNA microarray technology has arguably caught the attention of the worldwide life science community and is now systematically supporting major discoveries in many fields of study. The majority of the initial technical challenges of conducting experiments are being resolved, only to be replaced with new informatics hurdles, including statistical analysis, data visualization, interpretation, and storage. Two systems of databases, one containing expression data and one containing annotation data are quickly becoming essential knowledge repositories of the research community. This present paper surveys several databases, which are considered "pillars" of research and important nodes in the network. This paper focuses on a generalized workflow scheme typical for microarray experiments using two examples related to cancer research. The workflow is used to reference appropriate databases and tools for each step in the process of array experimentation. Additionally, benefits and drawbacks of current array databases are addressed, and suggestions are made for their improvement.
Resumo:
This paper presents general problems and approaches for the spatial data analysis using machine learning algorithms. Machine learning is a very powerful approach to adaptive data analysis, modelling and visualisation. The key feature of the machine learning algorithms is that they learn from empirical data and can be used in cases when the modelled environmental phenomena are hidden, nonlinear, noisy and highly variable in space and in time. Most of the machines learning algorithms are universal and adaptive modelling tools developed to solve basic problems of learning from data: classification/pattern recognition, regression/mapping and probability density modelling. In the present report some of the widely used machine learning algorithms, namely artificial neural networks (ANN) of different architectures and Support Vector Machines (SVM), are adapted to the problems of the analysis and modelling of geo-spatial data. Machine learning algorithms have an important advantage over traditional models of spatial statistics when problems are considered in a high dimensional geo-feature spaces, when the dimension of space exceeds 5. Such features are usually generated, for example, from digital elevation models, remote sensing images, etc. An important extension of models concerns considering of real space constrains like geomorphology, networks, and other natural structures. Recent developments in semi-supervised learning can improve modelling of environmental phenomena taking into account on geo-manifolds. An important part of the study deals with the analysis of relevant variables and models' inputs. This problem is approached by using different feature selection/feature extraction nonlinear tools. To demonstrate the application of machine learning algorithms several interesting case studies are considered: digital soil mapping using SVM, automatic mapping of soil and water system pollution using ANN; natural hazards risk analysis (avalanches, landslides), assessments of renewable resources (wind fields) with SVM and ANN models, etc. The dimensionality of spaces considered varies from 2 to more than 30. Figures 1, 2, 3 demonstrate some results of the studies and their outputs. Finally, the results of environmental mapping are discussed and compared with traditional models of geostatistics.
Resumo:
A large number of parameters have been identified as predictors of early outcome in patients with acute ischemic stroke. In the present work we analyzed a wide range of demographic, metabolic, physiological, clinical, laboratory and neuroimaging parameters in a large population of consecutive patients with acute ischemic stroke with the aim of identifying independent predictors of the early clinical course. We used prospectively collected data from the Acute Stroke Registry and Analysis of Lausanne. All consecutive patients with ischemic stroke admitted to our stroke unit and/or intensive care unit between 1 January 2003 and 12 December 2008 within 24 h after last-well time were analyzed. Univariate and multivariate analyses were performed to identify significant associations with the National Institute of Health Stroke Scale (NIHSS) score at admission and 24 h later. We also sought any interactions between the identified predictors. Of the 1,730 consecutive patients with acute ischemic stroke who were included in the analysis, 260 (15.0%) were thrombolyzed (mostly intravenously) within the recommended time window. In multivariate analysis, the NIHSS score at 24 h after admission was associated with the NIHSS score at admission (β = 1, p < 0.001), initial glucose level (β = 0.05, p < 0.002) and thrombolytic intervention (β = -2.91, p < 0.001). There was a significant interaction between thrombolysis and the NIHSS score at admission (p < 0.001), indicating that the short-term effect of thrombolysis decreases with increasing initial stroke severity. Thrombolytic treatment, lower initial glucose level and lower initial stroke severity predict a favorable early clinical course. The short-term effect of thrombolysis appears mainly in minor and moderate strokes, and decreases with increasing initial stroke severity.
Resumo:
This is a 2006 national report to the EMCDDA, using 2005 data. It is compiled by the Reitox national focal point and covers epidemiology, policing, strategy, drugs markets, drug-related infectious diseases, drug-related death and problem drug use in Norway.This resource was contributed by The National Documentation Centre on Drug Use.
Resumo:
Significant progress has been made with regard to the quantitative integration of geophysical and hydrological data at the local scale for the purpose of improving predictions of groundwater flow and solute transport. However, extending corresponding approaches to the regional scale still represents one of the major challenges in the domain of hydrogeophysics. To address this problem, we have developed a regional-scale data integration methodology based on a two-step Bayesian sequential simulation approach. Our objective is to generate high-resolution stochastic realizations of the regional-scale hydraulic conductivity field in the common case where there exist spatially exhaustive but poorly resolved measurements of a related geophysical parameter, as well as highly resolved but spatially sparse collocated measurements of this geophysical parameter and the hydraulic conductivity. To integrate this multi-scale, multi-parameter database, we first link the low- and high-resolution geophysical data via a stochastic downscaling procedure. This is followed by relating the downscaled geophysical data to the high-resolution hydraulic conductivity distribution. After outlining the general methodology of the approach, we demonstrate its application to a realistic synthetic example where we consider as data high-resolution measurements of the hydraulic and electrical conductivities at a small number of borehole locations, as well as spatially exhaustive, low-resolution estimates of the electrical conductivity obtained from surface-based electrical resistivity tomography. The different stochastic realizations of the hydraulic conductivity field obtained using our procedure are validated by comparing their solute transport behaviour with that of the underlying ?true? hydraulic conductivity field. We find that, even in the presence of strong subsurface heterogeneity, our proposed procedure allows for the generation of faithful representations of the regional-scale hydraulic conductivity structure and reliable predictions of solute transport over long, regional-scale distances.
Resumo:
In an earlier investigation (Burger et al., 2000) five sediment cores near the RodriguesTriple Junction in the Indian Ocean were studied applying classical statistical methods(fuzzy c-means clustering, linear mixing model, principal component analysis) for theextraction of endmembers and evaluating the spatial and temporal variation ofgeochemical signals. Three main factors of sedimentation were expected by the marinegeologists: a volcano-genetic, a hydro-hydrothermal and an ultra-basic factor. Thedisplay of fuzzy membership values and/or factor scores versus depth providedconsistent results for two factors only; the ultra-basic component could not beidentified. The reason for this may be that only traditional statistical methods wereapplied, i.e. the untransformed components were used and the cosine-theta coefficient assimilarity measure.During the last decade considerable progress in compositional data analysis was madeand many case studies were published using new tools for exploratory analysis of thesedata. Therefore it makes sense to check if the application of suitable data transformations,reduction of the D-part simplex to two or three factors and visualinterpretation of the factor scores would lead to a revision of earlier results and toanswers to open questions . In this paper we follow the lines of a paper of R. Tolosana-Delgado et al. (2005) starting with a problem-oriented interpretation of the biplotscattergram, extracting compositional factors, ilr-transformation of the components andvisualization of the factor scores in a spatial context: The compositional factors will beplotted versus depth (time) of the core samples in order to facilitate the identification ofthe expected sources of the sedimentary process.Kew words: compositional data analysis, biplot, deep sea sediments
Resumo:
Self-organizing maps (Kohonen 1997) is a type of artificial neural network developedto explore patterns in high-dimensional multivariate data. The conventional versionof the algorithm involves the use of Euclidean metric in the process of adaptation ofthe model vectors, thus rendering in theory a whole methodology incompatible withnon-Euclidean geometries.In this contribution we explore the two main aspects of the problem:1. Whether the conventional approach using Euclidean metric can shed valid resultswith compositional data.2. If a modification of the conventional approach replacing vectorial sum and scalarmultiplication by the canonical operators in the simplex (i.e. perturbation andpowering) can converge to an adequate solution.Preliminary tests showed that both methodologies can be used on compositional data.However, the modified version of the algorithm performs poorer than the conventionalversion, in particular, when the data is pathological. Moreover, the conventional ap-proach converges faster to a solution, when data is \well-behaved".Key words: Self Organizing Map; Artificial Neural networks; Compositional data
Resumo:
In this paper we examine the problem of compositional data from a different startingpoint. Chemical compositional data, as used in provenance studies on archaeologicalmaterials, will be approached from the measurement theory. The results will show, in avery intuitive way that chemical data can only be treated by using the approachdeveloped for compositional data. It will be shown that compositional data analysis is aparticular case in projective geometry, when the projective coordinates are in thepositive orthant, and they have the properties of logarithmic interval metrics. Moreover,it will be shown that this approach can be extended to a very large number ofapplications, including shape analysis. This will be exemplified with a case study inarchitecture of Early Christian churches dated back to the 5th-7th centuries AD
Resumo:
Factor analysis as frequent technique for multivariate data inspection is widely used also for compositional data analysis. The usual way is to use a centered logratio (clr)transformation to obtain the random vector y of dimension D. The factor model istheny = Λf + e (1)with the factors f of dimension k & D, the error term e, and the loadings matrix Λ.Using the usual model assumptions (see, e.g., Basilevsky, 1994), the factor analysismodel (1) can be written asCov(y) = ΛΛT + ψ (2)where ψ = Cov(e) has a diagonal form. The diagonal elements of ψ as well as theloadings matrix Λ are estimated from an estimation of Cov(y).Given observed clr transformed data Y as realizations of the random vectory. Outliers or deviations from the idealized model assumptions of factor analysiscan severely effect the parameter estimation. As a way out, robust estimation ofthe covariance matrix of Y will lead to robust estimates of Λ and ψ in (2), seePison et al. (2003). Well known robust covariance estimators with good statisticalproperties, like the MCD or the S-estimators (see, e.g. Maronna et al., 2006), relyon a full-rank data matrix Y which is not the case for clr transformed data (see,e.g., Aitchison, 1986).The isometric logratio (ilr) transformation (Egozcue et al., 2003) solves thissingularity problem. The data matrix Y is transformed to a matrix Z by usingan orthonormal basis of lower dimension. Using the ilr transformed data, a robustcovariance matrix C(Z) can be estimated. The result can be back-transformed tothe clr space byC(Y ) = V C(Z)V Twhere the matrix V with orthonormal columns comes from the relation betweenthe clr and the ilr transformation. Now the parameters in the model (2) can beestimated (Basilevsky, 1994) and the results have a direct interpretation since thelinks to the original variables are still preserved.The above procedure will be applied to data from geochemistry. Our specialinterest is on comparing the results with those of Reimann et al. (2002) for the Kolaproject data
Resumo:
Background. DNA-damage assays, quantifying the initial number of DNA double-strand breaks induced by radiation, have been proposed as a predictive test for radiation-induced toxicity. Determination of radiation-induced apoptosis in peripheral blood lymphocytes by flow cytometry analysis has also been proposed as an approach for predicting normal tissue responses following radiotherapy. The aim of the present study was to explore the association between initial DNA damage, estimated by the number of double-strand breaks induced by a given radiation dose, and the radio-induced apoptosis rates observed. Methods. Peripheral blood lymphocytes were taken from 26 consecutive patients with locally advanced breast carcinoma. Radiosensitivity of lymphocytes was quantified as the initial number of DNA double-strand breaks induced per Gy and per DNA unit (200 Mbp). Radio-induced apoptosis at 1, 2 and 8 Gy was measured by flow cytometry using annexin V/propidium iodide. Results. Radiation-induced apoptosis increased in order to radiation dose and data fitted to a semi logarithmic mathematical model. A positive correlation was found among radio-induced apoptosis values at different radiation doses: 1, 2 and 8 Gy (p < 0.0001 in all cases). Mean DSB/Gy/DNA unit obtained was 1.70 ± 0.83 (range 0.63-4.08; median, 1.46). A statistically significant inverse correlation was found between initial damage to DNA and radio-induced apoptosis at 1 Gy (p = 0.034). A trend toward 2 Gy (p = 0.057) and 8 Gy (p = 0.067) was observed after 24 hours of incubation. Conclusions. An inverse association was observed for the first time between these variables, both considered as predictive factors to radiation toxicity.
Resumo:
The integration of geophysical data into the subsurface characterization problem has been shown in many cases to significantly improve hydrological knowledge by providing information at spatial scales and locations that is unattainable using conventional hydrological measurement techniques. The investigation of exactly how much benefit can be brought by geophysical data in terms of its effect on hydrological predictions, however, has received considerably less attention in the literature. Here, we examine the potential hydrological benefits brought by a recently introduced simulated annealing (SA) conditional stochastic simulation method designed for the assimilation of diverse hydrogeophysical data sets. We consider the specific case of integrating crosshole ground-penetrating radar (GPR) and borehole porosity log data to characterize the porosity distribution in saturated heterogeneous aquifers. In many cases, porosity is linked to hydraulic conductivity and thus to flow and transport behavior. To perform our evaluation, we first generate a number of synthetic porosity fields exhibiting varying degrees of spatial continuity and structural complexity. Next, we simulate the collection of crosshole GPR data between several boreholes in these fields, and the collection of porosity log data at the borehole locations. The inverted GPR data, together with the porosity logs, are then used to reconstruct the porosity field using the SA-based method, along with a number of other more elementary approaches. Assuming that the grid-cell-scale relationship between porosity and hydraulic conductivity is unique and known, the porosity realizations are then used in groundwater flow and contaminant transport simulations to assess the benefits and limitations of the different approaches.
Resumo:
The application of compositional data analysis through log ratio trans-formations corresponds to a multinomial logit model for the shares themselves.This model is characterized by the property of Independence of Irrelevant Alter-natives (IIA). IIA states that the odds ratio in this case the ratio of shares is invariant to the addition or deletion of outcomes to the problem. It is exactlythis invariance of the ratio that underlies the commonly used zero replacementprocedure in compositional data analysis. In this paper we investigate using thenested logit model that does not embody IIA and an associated zero replacementprocedure and compare its performance with that of the more usual approach ofusing the multinomial logit model. Our comparisons exploit a data set that com-bines voting data by electoral division with corresponding census data for eachdivision for the 2001 Federal election in Australia
Resumo:
This analysis was stimulated by the real data analysis problem of householdexpenditure data. The full dataset contains expenditure data for a sample of 1224 households. The expenditure is broken down at 2 hierarchical levels: 9 major levels (e.g. housing, food, utilities etc.) and 92 minor levels. There are also 5 factors and 5 covariates at the household level. Not surprisingly, there are a small number of zeros at the major level, but many zeros at the minor level. The question is how best to model the zeros. Clearly, models that tryto add a small amount to the zero terms are not appropriate in general as at least some of the zeros are clearly structural, e.g. alcohol/tobacco for households that are teetotal. The key question then is how to build suitable conditional models. For example, is the sub-composition of spendingexcluding alcohol/tobacco similar for teetotal and non-teetotal households?In other words, we are looking for sub-compositional independence. Also, what determines whether a household is teetotal? Can we assume that it is independent of the composition? In general, whether teetotal will clearly depend on the household level variables, so we need to be able to model this dependence. The other tricky question is that with zeros on more than onecomponent, we need to be able to model dependence and independence of zeros on the different components. Lastly, while some zeros are structural, others may not be, for example, for expenditure on durables, it may be chance as to whether a particular household spends money on durableswithin the sample period. This would clearly be distinguishable if we had longitudinal data, but may still be distinguishable by looking at the distribution, on the assumption that random zeros will usually be for situations where any non-zero expenditure is not small.While this analysis is based on around economic data, the ideas carry over tomany other situations, including geological data, where minerals may be missing for structural reasons (similar to alcohol), or missing because they occur only in random regions which may be missed in a sample (similar to the durables)
Resumo:
Compositional random vectors are fundamental tools in the Bayesian analysis of categorical data.Many of the issues that are discussed with reference to the statistical analysis of compositionaldata have a natural counterpart in the construction of a Bayesian statistical model for categoricaldata.This note builds on the idea of cross-fertilization of the two areas recommended by Aitchison (1986)in his seminal book on compositional data. Particular emphasis is put on the problem of whatparameterization to use