Biblioteca Digital

999 resultados para data mart

Using self organizing maps on compositional data

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Self-organizing maps (Kohonen 1997) is a type of artificial neural network developedto explore patterns in high-dimensional multivariate data. The conventional versionof the algorithm involves the use of Euclidean metric in the process of adaptation ofthe model vectors, thus rendering in theory a whole methodology incompatible withnon-Euclidean geometries.In this contribution we explore the two main aspects of the problem:1. Whether the conventional approach using Euclidean metric can shed valid resultswith compositional data.2. If a modification of the conventional approach replacing vectorial sum and scalarmultiplication by the canonical operators in the simplex (i.e. perturbation andpowering) can converge to an adequate solution.Preliminary tests showed that both methodologies can be used on compositional data.However, the modified version of the algorithm performs poorer than the conventionalversion, in particular, when the data is pathological. Moreover, the conventional ap-proach converges faster to a solution, when data is \well-behaved".Key words: Self Organizing Map; Artificial Neural networks; Compositional data

Comparing methods for dimensionality reduction when data are density functions

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Functional Data Analysis (FDA) deals with samples where a whole function is observedfor each individual. A particular case of FDA is when the observed functions are densityfunctions, that are also an example of infinite dimensional compositional data. In thiswork we compare several methods for dimensionality reduction for this particular typeof data: functional principal components analysis (PCA) with or without a previousdata transformation and multidimensional scaling (MDS) for diferent inter-densitiesdistances, one of them taking into account the compositional nature of density functions. The difeerent methods are applied to both artificial and real data (householdsincome distributions)

Dynamic graphics of parametrically linked multivariate methods used in compositional data analysis

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Many multivariate methods that are apparently distinct can be linked by introducing oneor more parameters in their definition. Methods that can be linked in this way arecorrespondence analysis, unweighted or weighted logratio analysis (the latter alsoknown as "spectral mapping"), nonsymmetric correspondence analysis, principalcomponent analysis (with and without logarithmic transformation of the data) andmultidimensional scaling. In this presentation I will show how several of thesemethods, which are frequently used in compositional data analysis, may be linkedthrough parametrizations such as power transformations, linear transformations andconvex linear combinations. Since the methods of interest here all lead to visual mapsof data, a "movie" can be made where where the linking parameter is allowed to vary insmall steps: the results are recalculated "frame by frame" and one can see the smoothchange from one method to another. Several of these "movies" will be shown, giving adeeper insight into the similarities and differences between these methods

Clinical pharmacogenomic testing of KRAS, BRAF and EGFR mutations by high resolution melting analysis and ultra-deep pyrosequencing

Relevância:

20.00% 20.00%

Publicador:

Resumo:

BACKGROUND: Epidermal growth factor receptor (EGFR) and its downstream factors KRAS and BRAF are mutated in several types of cancer, affecting the clinical response to EGFR inhibitors. Mutations in the EGFR kinase domain predict sensitivity to the tyrosine kinase inhibitors gefitinib and erlotinib in lung adenocarcinoma, while activating point mutations in KRAS and BRAF confer resistance to the anti-EGFR monoclonal antibody cetuximab in colorectal cancer. The development of new generation methods for systematic mutation screening of these genes will allow more appropriate therapeutic choices. METHODS: We describe a high resolution melting (HRM) assay for mutation detection in EGFR exons 19-21, KRAS codon 12/13 and BRAF V600 using formalin-fixed paraffin-embedded samples. Somatic variation of KRAS exon 2 was also analysed by massively parallel pyrosequencing of amplicons with the GS Junior 454 platform. RESULTS: We tested 120 routine diagnostic specimens from patients with colorectal or lung cancer. Mutations in KRAS, BRAF and EGFR were observed in 41.9%, 13.0% and 11.1% of the overall samples, respectively, being mutually exclusive. For KRAS, six types of substitutions were detected (17 G12D, 9 G13D, 7 G12C, 2 G12A, 2 G12V, 2 G12S), while V600E accounted for all the BRAF activating mutations. Regarding EGFR, two cases showed exon 19 deletions (delE746-A750 and delE746-T751insA) and another two substitutions in exon 21 (one showed L858R with the resistance mutation T590M in exon 20, and the other had P848L mutation). Consistent with earlier reports, our results show that KRAS and BRAF mutation frequencies in colorectal cancer were 44.3% and 13.0%, respectively, while EGFR mutations were detected in 11.1% of the lung cancer specimens. Ultra-deep amplicon pyrosequencing successfully validated the HRM results and allowed detection and quantitation of KRAS somatic mutations. CONCLUSIONS: HRM is a rapid and sensitive method for moderate-throughput cost-effective screening of oncogene mutations in clinical samples. Rather than Sanger sequence validation, next-generation sequencing technology results in more accurate quantitative results in somatic variation and can be achieved at a higher throughput scale.

Revisiting the compositional data. Some fundamental questions and new prospects in Archaeometry and Archaeology

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper we examine the problem of compositional data from a different startingpoint. Chemical compositional data, as used in provenance studies on archaeologicalmaterials, will be approached from the measurement theory. The results will show, in avery intuitive way that chemical data can only be treated by using the approachdeveloped for compositional data. It will be shown that compositional data analysis is aparticular case in projective geometry, when the projective coordinates are in thepositive orthant, and they have the properties of logarithmic interval metrics. Moreover,it will be shown that this approach can be extended to a very large number ofapplications, including shape analysis. This will be exemplified with a case study inarchitecture of Early Christian churches dated back to the 5th-7th centuries AD

Robust Factor Analysis for Compositional Data

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Factor analysis as frequent technique for multivariate data inspection is widely used also for compositional data analysis. The usual way is to use a centered logratio (clr)transformation to obtain the random vector y of dimension D. The factor model istheny = Λf + e (1)with the factors f of dimension k & D, the error term e, and the loadings matrix Λ.Using the usual model assumptions (see, e.g., Basilevsky, 1994), the factor analysismodel (1) can be written asCov(y) = ΛΛT + ψ (2)where ψ = Cov(e) has a diagonal form. The diagonal elements of ψ as well as theloadings matrix Λ are estimated from an estimation of Cov(y).Given observed clr transformed data Y as realizations of the random vectory. Outliers or deviations from the idealized model assumptions of factor analysiscan severely effect the parameter estimation. As a way out, robust estimation ofthe covariance matrix of Y will lead to robust estimates of Λ and ψ in (2), seePison et al. (2003). Well known robust covariance estimators with good statisticalproperties, like the MCD or the S-estimators (see, e.g. Maronna et al., 2006), relyon a full-rank data matrix Y which is not the case for clr transformed data (see,e.g., Aitchison, 1986).The isometric logratio (ilr) transformation (Egozcue et al., 2003) solves thissingularity problem. The data matrix Y is transformed to a matrix Z by usingan orthonormal basis of lower dimension. Using the ilr transformed data, a robustcovariance matrix C(Z) can be estimated. The result can be back-transformed tothe clr space byC(Y ) = V C(Z)V Twhere the matrix V with orthonormal columns comes from the relation betweenthe clr and the ilr transformation. Now the parameters in the model (2) can beestimated (Basilevsky, 1994) and the results have a direct interpretation since thelinks to the original variables are still preserved.The above procedure will be applied to data from geochemistry. Our specialinterest is on comparing the results with those of Reimann et al. (2002) for the Kolaproject data

Use of high-resolution geophysical data to characterize heterogeneous aquifers: influence of data integration method on hydrological predictions

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The integration of geophysical data into the subsurface characterization problem has been shown in many cases to significantly improve hydrological knowledge by providing information at spatial scales and locations that is unattainable using conventional hydrological measurement techniques. The investigation of exactly how much benefit can be brought by geophysical data in terms of its effect on hydrological predictions, however, has received considerably less attention in the literature. Here, we examine the potential hydrological benefits brought by a recently introduced simulated annealing (SA) conditional stochastic simulation method designed for the assimilation of diverse hydrogeophysical data sets. We consider the specific case of integrating crosshole ground-penetrating radar (GPR) and borehole porosity log data to characterize the porosity distribution in saturated heterogeneous aquifers. In many cases, porosity is linked to hydraulic conductivity and thus to flow and transport behavior. To perform our evaluation, we first generate a number of synthetic porosity fields exhibiting varying degrees of spatial continuity and structural complexity. Next, we simulate the collection of crosshole GPR data between several boreholes in these fields, and the collection of porosity log data at the borehole locations. The inverted GPR data, together with the porosity logs, are then used to reconstruct the porosity field using the SA-based method, along with a number of other more elementary approaches. Assuming that the grid-cell-scale relationship between porosity and hydraulic conductivity is unique and known, the porosity realizations are then used in groundwater flow and contaminant transport simulations to assess the benefits and limitations of the different approaches.

Psychometric characteristics of the Spanish version of instruments to measure neck pain disability

Relevância:

20.00% 20.00%

Publicador:

Resumo:

BACKGROUND. The NDI, COM and NPQ are evaluation instruments for disability due to NP. There was no Spanish version of NDI or COM for which psychometric characteristics were known. The objectives of this study were to translate and culturally adapt the Spanish version of the Neck Disability Index Questionnaire (NDI), and the Core Outcome Measure (COM), to validate its use in Spanish speaking patients with non-specific neck pain (NP), and to compare their psychometric characteristics with those of the Spanish version of the Northwick Pain Questionnaire (NPQ). METHODS. Translation/re-translation of the English versions of the NDI and the COM was done blindly and independently by a multidisciplinary team. The study was done in 9 primary care Centers and 12 specialty services from 9 regions in Spain, with 221 acute, subacute and chronic patients who visited their physician for NP: 54 in the pilot phase and 167 in the validation phase. Neck pain (VAS), referred pain (VAS), disability (NDI, COM and NPQ), catastrophizing (CSQ) and quality of life (SF-12) were measured on their first visit and 14 days later. Patients' self-assessment was used as the external criterion for pain and disability. In the pilot phase, patients' understanding of each item in the NDI and COM was assessed, and on day 1 test-retest reliability was estimated by giving a second NDI and COM in which the name of the questionnaires and the order of the items had been changed. RESULTS. Comprehensibility of NDI and COM were good. Minutes needed to fill out the questionnaires [median, (P25, P75)]: NDI. 4 (2.2, 10.0), COM: 2.1 (1.0, 4.9). Reliability: [ICC, (95%CI)]: NDI: 0.88 (0.80, 0.93). COM: 0.85 (0.75,0.91). Sensitivity to change: Effect size for patients having worsened, not changed and improved between days 1 and 15, according to the external criterion for disability: NDI: -0.24, 0.15, 0.66; NPQ: -0.14, 0.06, 0.67; COM: 0.05, 0.19, 0.92. Validity: Results of NDI, NPQ and COM were consistent with the external criterion for disability, whereas only those from NDI were consistent with the one for pain. Correlations with VAS, CSQ and SF-12 were similar for NDI and NPQ (absolute values between 0.36 and 0.50 on day 1, between 0.38 and 0.70 on day 15), and slightly lower for COM (between 0.36 and 0.48 on day 1, and between 0.33 and 0.61 on day 15). Correlation between NDI and NPQ: r = 0.84 on day 1, r = 0.91 on day 15. Correlation between COM and NPQ: r = 0.63 on day 1, r = 0.71 on day 15. CONCLUSION. Although most psychometric characteristics of NDI, NPQ and COM are similar, those from the latter one are worse and its use may lead to patients' evolution seeming more positive than it actually is. NDI seems to be the best instrument for measuring NP-related disability, since its results are the most consistent with patient's assessment of their own clinical status and evolution. It takes two more minutes to answer the NDI than to answer the COM, but it can be reliably filled out by the patient without assistance. TRIAL REGISTRATION Clinical Trials Register NCT00349544.

Estudio transversal sobre la prevalencia de la Enfermedad Metabólica Ósea (EMO) y Nutrición Parenteral Domiciliaria (NPD) en España: datos del Grupo NADYA

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Patients with intestinal failure who receive HPN are at high risk of developing MBD. The origin of this bone alteration is multifactorial and depends greatly on the underlying disease for which the nutritional support is required. Data on the prevalence of this disease in our environment is lacking, so NADYA-SEMPE group has sponsored this transversal study with the aim of knowing the actual MBD prevalence. MATERIAL AND METHODS: Retrospective data from 51 patients from 13 hospitals were collected. The questionnaire included demographic data as well as the most clinically relevant for MBD data. Laboratory data (calciuria, PTH, 25 -OH -vitamin D) and the results from the first and last bone densitometry were also registered. RESULTS: Bone mineral density had only been assessed by densitometry in 21 patients at the moment HPN was started. Bone quality is already altered before HPN in a significant percentage of cases (52%). After a mean follow up of 6 years, this percentage increases up to 81%. Due to retrospective nature of the study and the low number of subjects included it has not been possible to determine the role that HPN plays in MBD etiology. Only 35% of patients have vitamin D levels above the recommended limits and the majority of them is not on specific supplementation. CONCLUSIONS: HPN is associated with very high risk of MBD, therefore, management protocols that can lead to early detection of the problem as well as guiding for follow up and treatment of these patients are needed.

Compositional data analysis: where we are and where should we be heading?

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We take stock of the present position of compositional data analysis, of what has beenachieved in the last 20 years, and then make suggestions as to what may be sensibleavenues of future research. We take an uncompromisingly applied mathematical view,that the challenge of solving practical problems should motivate our theoreticalresearch; and that any new theory should be thoroughly investigated to see if it mayprovide answers to previously abandoned practical considerations. Indeed a main themeof this lecture will be to demonstrate this applied mathematical approach by a number ofchallenging examples

When a data set can be considered compositional?

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Traditionally, compositional data has been identified with closed data, and the simplex has been considered as the natural sample space of this kind of data. In our opinion, the emphasis on the constrained nature ofcompositional data has contributed to mask its real nature. More crucial than the constraining property of compositional data is the scale-invariant property of this kind of data. Indeed, when we are considering only few parts of a full composition we are not working with constrained data but our data are still compositional. We believe that it is necessary to give a more precisedefinition of composition. This is the aim of this oral contribution

Spatial Analysis of Cell Composition Data

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A version of Matheron’s discrete Gaussian model is applied to cell composition data.The examples are for map patterns of felsic metavolcanics in two different areas. Q-Qplots of the model for cell values representing proportion of 10 km x 10 km cell areaunderlain by this rock type are approximately linear, and the line of best fit can be usedto estimate the parameters of the model. It is also shown that felsic metavolcanics in theAbitibi area of the Canadian Shield can be modeled as a fractal

Chemical products : basic guide on labelling and safety data sheets

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Hazardous chemical products have to comply with, amongst others, the provisions of a correct classification of danger, labelling and compilation of the safety data sheets. The aim is to protect people's health and the environment from exposure to hazardous chemicals- especially the health and safety of direct users, professionals or not, and the general public, via environmental exposure. This publication is intended to contribute to the knowledge of the objectives and basic aspects of these legal provisions, and thereby increase their degree of compliance in Andalusia and other european regions. This Guide is directed toward those people who, in the development of their professional activities, are in one way or another in contact with dangerous chemical products.

Chemical products : basic guide on labelling and safety data sheets

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Hazardous chemical products have to comply with, amongst others, the provisions of a correct classification of danger, labelling and compilation of the safety data sheets. The aim is to protect people's health and the environment from exposure to hazardous chemicals- especially the health and safety of direct users, professionals or not, and the general public, via environmental exposure. This publication is intended to contribute to the knowledge of the objectives and basic aspects of these legal provisions, and thereby increase their degree of compliance in Andalusia and other european regions. This Guide is directed toward those people who, in the development of their professional activities, are in one way or another in contact with dangerous chemical products.

Comparing correspondence analysis and the log-ratio alternative for representing categorical data

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We compare correspondance análisis to the logratio approach based on compositional data. We also compare correspondance análisis and an alternative approach using Hellinger distance, for representing categorical data in a contingency table. We propose a coefficient which globally measures the similarity between these approaches. This coefficient can be decomposed into several components, one component for each principal dimension, indicating the contribution of the dimensions to the difference between the two representations. These three methods of representation can produce quite similar results. One illustrative example is given

«
1
2
...
59
60
61
62
63
64
65
66
67
»