1000 resultados para phenotipic data
Resumo:
Camallanus tridentatus is redescribed on the basis of the examination of specimens obtained from the stomach, caeca and intestine of the naturally infected arapaima Arapaima gigas (Schinz) from the Mexiana Island, Amazon River Delta, Brazil. Data on the surface morphology of adults inferred from confocal laser scanning and scanning electron microscopical observations are also provided. The study revealed some taxonomically important, previously unreported morphological features in this species, such as the presence of the poorly sclerotized left spicule and deirids. C. tridentatus distinctly differs from other congeneric species parasitizing freshwater fishes in South America mainly in the structure of the buccal capsule and the female caudal end. C. maculatus Martins, Garcia, Piazza and Ghiraldelli is considered a junior synonymm of Camallanus cotti Fujita.
Resumo:
Background. The use of hospital discharge administrative data (HDAD) has been recommended for automating, improving, even substituting, population-based cancer registries. The frequency of false positive and false negative cases recommends local validation. Methods. The aim of this study was to detect newly diagnosed, false positive and false negative cases of cancer from hospital discharge claims, using four Spanish population-based cancer registries as the gold standard. Prostate cancer was used as a case study. Results. A total of 2286 incident cases of prostate cancer registered in 2000 were used for validation. In the most sensitive algorithm (that using five diagnostic codes), estimates for Sensitivity ranged from 14.5% (CI95% 10.3-19.6) to 45.7% (CI95% 41.4-50.1). In the most predictive algorithm (that using five diagnostic and five surgical codes) Positive Predictive Value estimates ranged from 55.9% (CI95% 42.4-68.8) to 74.3% (CI95% 67.0-80.6). The most frequent reason for false positive cases was the number of prevalent cases inadequately considered as newly diagnosed cancers, ranging from 61.1% to 82.3% of false positive cases. The most frequent reason for false negative cases was related to the number of cases not attended in hospital settings. In this case, figures ranged from 34.4% to 69.7% of false negative cases, in the most predictive algorithm. Conclusions. HDAD might be a helpful tool for cancer registries to reach their goals. The findings suggest that, for automating cancer registries, algorithms combining diagnoses and procedures are the best option. However, for cancer surveillance purposes, in those cancers like prostate cancer in which care is not only hospital-based, combining inpatient and outpatient information will be required.
Resumo:
In an earlier investigation (Burger et al., 2000) five sediment cores near the RodriguesTriple Junction in the Indian Ocean were studied applying classical statistical methods(fuzzy c-means clustering, linear mixing model, principal component analysis) for theextraction of endmembers and evaluating the spatial and temporal variation ofgeochemical signals. Three main factors of sedimentation were expected by the marinegeologists: a volcano-genetic, a hydro-hydrothermal and an ultra-basic factor. Thedisplay of fuzzy membership values and/or factor scores versus depth providedconsistent results for two factors only; the ultra-basic component could not beidentified. The reason for this may be that only traditional statistical methods wereapplied, i.e. the untransformed components were used and the cosine-theta coefficient assimilarity measure.During the last decade considerable progress in compositional data analysis was madeand many case studies were published using new tools for exploratory analysis of thesedata. Therefore it makes sense to check if the application of suitable data transformations,reduction of the D-part simplex to two or three factors and visualinterpretation of the factor scores would lead to a revision of earlier results and toanswers to open questions . In this paper we follow the lines of a paper of R. Tolosana-Delgado et al. (2005) starting with a problem-oriented interpretation of the biplotscattergram, extracting compositional factors, ilr-transformation of the components andvisualization of the factor scores in a spatial context: The compositional factors will beplotted versus depth (time) of the core samples in order to facilitate the identification ofthe expected sources of the sedimentary process.Kew words: compositional data analysis, biplot, deep sea sediments
Resumo:
Planners in public and private institutions would like coherent forecasts of the components of age-specic mortality, such as causes of death. This has been di cult toachieve because the relative values of the forecast components often fail to behave ina way that is coherent with historical experience. In addition, when the group forecasts are combined the result is often incompatible with an all-groups forecast. It hasbeen shown that cause-specic mortality forecasts are pessimistic when compared withall-cause forecasts (Wilmoth, 1995). This paper abandons the conventional approachof using log mortality rates and forecasts the density of deaths in the life table. Sincethese values obey a unit sum constraint for both conventional single-decrement life tables (only one absorbing state) and multiple-decrement tables (more than one absorbingstate), they are intrinsically relative rather than absolute values across decrements aswell as ages. Using the methods of Compositional Data Analysis pioneered by Aitchison(1986), death densities are transformed into the real space so that the full range of multivariate statistics can be applied, then back-transformed to positive values so that theunit sum constraint is honoured. The structure of the best-known, single-decrementmortality-rate forecasting model, devised by Lee and Carter (1992), is expressed incompositional form and the results from the two models are compared. The compositional model is extended to a multiple-decrement form and used to forecast mortalityby cause of death for Japan
Resumo:
Self-organizing maps (Kohonen 1997) is a type of artificial neural network developedto explore patterns in high-dimensional multivariate data. The conventional versionof the algorithm involves the use of Euclidean metric in the process of adaptation ofthe model vectors, thus rendering in theory a whole methodology incompatible withnon-Euclidean geometries.In this contribution we explore the two main aspects of the problem:1. Whether the conventional approach using Euclidean metric can shed valid resultswith compositional data.2. If a modification of the conventional approach replacing vectorial sum and scalarmultiplication by the canonical operators in the simplex (i.e. perturbation andpowering) can converge to an adequate solution.Preliminary tests showed that both methodologies can be used on compositional data.However, the modified version of the algorithm performs poorer than the conventionalversion, in particular, when the data is pathological. Moreover, the conventional ap-proach converges faster to a solution, when data is \well-behaved".Key words: Self Organizing Map; Artificial Neural networks; Compositional data
Resumo:
Functional Data Analysis (FDA) deals with samples where a whole function is observedfor each individual. A particular case of FDA is when the observed functions are densityfunctions, that are also an example of infinite dimensional compositional data. In thiswork we compare several methods for dimensionality reduction for this particular typeof data: functional principal components analysis (PCA) with or without a previousdata transformation and multidimensional scaling (MDS) for diferent inter-densitiesdistances, one of them taking into account the compositional nature of density functions. The difeerent methods are applied to both artificial and real data (householdsincome distributions)
Resumo:
Many multivariate methods that are apparently distinct can be linked by introducing oneor more parameters in their definition. Methods that can be linked in this way arecorrespondence analysis, unweighted or weighted logratio analysis (the latter alsoknown as "spectral mapping"), nonsymmetric correspondence analysis, principalcomponent analysis (with and without logarithmic transformation of the data) andmultidimensional scaling. In this presentation I will show how several of thesemethods, which are frequently used in compositional data analysis, may be linkedthrough parametrizations such as power transformations, linear transformations andconvex linear combinations. Since the methods of interest here all lead to visual mapsof data, a "movie" can be made where where the linking parameter is allowed to vary insmall steps: the results are recalculated "frame by frame" and one can see the smoothchange from one method to another. Several of these "movies" will be shown, giving adeeper insight into the similarities and differences between these methods
Resumo:
In this paper we examine the problem of compositional data from a different startingpoint. Chemical compositional data, as used in provenance studies on archaeologicalmaterials, will be approached from the measurement theory. The results will show, in avery intuitive way that chemical data can only be treated by using the approachdeveloped for compositional data. It will be shown that compositional data analysis is aparticular case in projective geometry, when the projective coordinates are in thepositive orthant, and they have the properties of logarithmic interval metrics. Moreover,it will be shown that this approach can be extended to a very large number ofapplications, including shape analysis. This will be exemplified with a case study inarchitecture of Early Christian churches dated back to the 5th-7th centuries AD
Resumo:
Factor analysis as frequent technique for multivariate data inspection is widely used also for compositional data analysis. The usual way is to use a centered logratio (clr)transformation to obtain the random vector y of dimension D. The factor model istheny = Λf + e (1)with the factors f of dimension k & D, the error term e, and the loadings matrix Λ.Using the usual model assumptions (see, e.g., Basilevsky, 1994), the factor analysismodel (1) can be written asCov(y) = ΛΛT + ψ (2)where ψ = Cov(e) has a diagonal form. The diagonal elements of ψ as well as theloadings matrix Λ are estimated from an estimation of Cov(y).Given observed clr transformed data Y as realizations of the random vectory. Outliers or deviations from the idealized model assumptions of factor analysiscan severely effect the parameter estimation. As a way out, robust estimation ofthe covariance matrix of Y will lead to robust estimates of Λ and ψ in (2), seePison et al. (2003). Well known robust covariance estimators with good statisticalproperties, like the MCD or the S-estimators (see, e.g. Maronna et al., 2006), relyon a full-rank data matrix Y which is not the case for clr transformed data (see,e.g., Aitchison, 1986).The isometric logratio (ilr) transformation (Egozcue et al., 2003) solves thissingularity problem. The data matrix Y is transformed to a matrix Z by usingan orthonormal basis of lower dimension. Using the ilr transformed data, a robustcovariance matrix C(Z) can be estimated. The result can be back-transformed tothe clr space byC(Y ) = V C(Z)V Twhere the matrix V with orthonormal columns comes from the relation betweenthe clr and the ilr transformation. Now the parameters in the model (2) can beestimated (Basilevsky, 1994) and the results have a direct interpretation since thelinks to the original variables are still preserved.The above procedure will be applied to data from geochemistry. Our specialinterest is on comparing the results with those of Reimann et al. (2002) for the Kolaproject data
Resumo:
The integration of geophysical data into the subsurface characterization problem has been shown in many cases to significantly improve hydrological knowledge by providing information at spatial scales and locations that is unattainable using conventional hydrological measurement techniques. The investigation of exactly how much benefit can be brought by geophysical data in terms of its effect on hydrological predictions, however, has received considerably less attention in the literature. Here, we examine the potential hydrological benefits brought by a recently introduced simulated annealing (SA) conditional stochastic simulation method designed for the assimilation of diverse hydrogeophysical data sets. We consider the specific case of integrating crosshole ground-penetrating radar (GPR) and borehole porosity log data to characterize the porosity distribution in saturated heterogeneous aquifers. In many cases, porosity is linked to hydraulic conductivity and thus to flow and transport behavior. To perform our evaluation, we first generate a number of synthetic porosity fields exhibiting varying degrees of spatial continuity and structural complexity. Next, we simulate the collection of crosshole GPR data between several boreholes in these fields, and the collection of porosity log data at the borehole locations. The inverted GPR data, together with the porosity logs, are then used to reconstruct the porosity field using the SA-based method, along with a number of other more elementary approaches. Assuming that the grid-cell-scale relationship between porosity and hydraulic conductivity is unique and known, the porosity realizations are then used in groundwater flow and contaminant transport simulations to assess the benefits and limitations of the different approaches.
Resumo:
We take stock of the present position of compositional data analysis, of what has beenachieved in the last 20 years, and then make suggestions as to what may be sensibleavenues of future research. We take an uncompromisingly applied mathematical view,that the challenge of solving practical problems should motivate our theoreticalresearch; and that any new theory should be thoroughly investigated to see if it mayprovide answers to previously abandoned practical considerations. Indeed a main themeof this lecture will be to demonstrate this applied mathematical approach by a number ofchallenging examples
Resumo:
Traditionally, compositional data has been identified with closed data, and the simplex has been considered as the natural sample space of this kind of data. In our opinion, the emphasis on the constrained nature ofcompositional data has contributed to mask its real nature. More crucial than the constraining property of compositional data is the scale-invariant property of this kind of data. Indeed, when we are considering only few parts of a full composition we are not working with constrained data but our data are still compositional. We believe that it is necessary to give a more precisedefinition of composition. This is the aim of this oral contribution
Resumo:
A version of Matheron’s discrete Gaussian model is applied to cell composition data.The examples are for map patterns of felsic metavolcanics in two different areas. Q-Qplots of the model for cell values representing proportion of 10 km x 10 km cell areaunderlain by this rock type are approximately linear, and the line of best fit can be usedto estimate the parameters of the model. It is also shown that felsic metavolcanics in theAbitibi area of the Canadian Shield can be modeled as a fractal
Resumo:
Hazardous chemical products have to comply with, amongst others, the provisions of a correct classification of danger, labelling and compilation of the safety data sheets. The aim is to protect people's health and the environment from exposure to hazardous chemicals- especially the health and safety of direct users, professionals or not, and the general public, via environmental exposure. This publication is intended to contribute to the knowledge of the objectives and basic aspects of these legal provisions, and thereby increase their degree of compliance in Andalusia and other european regions. This Guide is directed toward those people who, in the development of their professional activities, are in one way or another in contact with dangerous chemical products.